Pavel Emelyanov [Mon, 15 Oct 2007 09:33:45 +0000 (02:33 -0700)]
[INET]: Collect common frag sysctl variables together
Some sysctl variables are used to tune the frag queues
management and it will be useful to work with them in
a common way in the future, so move them into one
structure, moreover they are the same for all the frag
management codes.
I don't place them in the existing inet_frags object,
introduced in the previous patch for two reasons:
1. to keep them in the __read_mostly section;
2. not to export the whole inet_frags objects outside.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 15 Oct 2007 09:31:52 +0000 (02:31 -0700)]
[INET]: Collect frag queues management objects together
There are some objects that are common in all the places
which are used to keep track of frag queues, they are:
* hash table
* LRU list
* rw lock
* rnd number for hash function
* the number of queues
* the amount of memory occupied by queues
* secret timer
Move all this stuff into one structure (struct inet_frags)
to make it possible use them uniformly in the future. Like
with the previous patch this mostly consists of hunks like
- write_lock(&ipfrag_lock);
+ write_lock(&ip4_frags.lock);
To address the issue with exporting the number of queues and
the amount of memory occupied by queues outside the .c file
they are declared in, I introduce a couple of helpers.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 15 Oct 2007 09:24:19 +0000 (02:24 -0700)]
[INET]: Move common fields from frag_queues in one place.
Introduce the struct inet_frag_queue in include/net/inet_frag.h
file and place there all the common fields from three structs:
* struct ipq in ipv4/ip_fragment.c
* struct nf_ct_frag6_queue in nf_conntrack_reasm.c
* struct frag_queue in ipv6/reassembly.c
After this, replace these fields on appropriate structures with
this structure instance and fix the users to use correct names
i.e. hunks like
- atomic_dec(&fq->refcnt);
+ atomic_dec(&fq->q.refcnt);
(these occupy most of the patch)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 15 Oct 2007 09:12:26 +0000 (02:12 -0700)]
[TG3]: Fix performance regression on 5705.
A performance regression was introduced by the following commit:
commit
ee6a99b539a50b4e9398938a0a6d37f8bf911550
Author: Michael Chan <mchan@broadcom.com>
Date: Wed Jul 18 21:49:10 2007 -0700
[TG3]: Fix msi issue with kexec/kdump.
In making that change, the PCI latency timer and cache line size
registers were not restored after chip reset. On the 5705, the
latency timer gets reset to 0 during chip reset and this causes
very poor performance.
Update version to 3.84.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Keil [Mon, 15 Oct 2007 09:11:44 +0000 (02:11 -0700)]
[ISDN]: Remove local copy of device name to make sure renames work.
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Mon, 15 Oct 2007 09:10:32 +0000 (02:10 -0700)]
[TCP]: high_seq parameter removed (all callers use tp->high_seq)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 15 Oct 2007 08:51:38 +0000 (01:51 -0700)]
[IPV6]: Uninline netfilter okfns
Uninline netfilter okfns for those cases where gcc can generate tail-calls.
Before:
text data bss dec hex filename
8994153 1016524 524652
10535329 a0c1a1 vmlinux
After:
text data bss dec hex filename
8992761 1016524 524652
10533937 a0bc31 vmlinux
-------------------------------------------------------
-1392
All cases have been verified to generate tail-calls with and without netfilter.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 15 Oct 2007 08:50:09 +0000 (01:50 -0700)]
[BRIDGE]: Remove SKB share checks in br_nf_pre_routing().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 15 Oct 2007 08:48:39 +0000 (01:48 -0700)]
[IPV4]: Uninline netfilter okfns
Now that we don't pass double skb pointers to nf_hook_slow anymore, gcc
can generate tail calls for some of the netfilter hook okfn invocations,
so there is no need to inline the functions anymore. This caused huge
code bloat since we ended up with one inlined version and one out-of-line
version since we pass the address to nf_hook_slow.
Before:
text data bss dec hex filename
8997385 1016524 524652
10538561 a0ce41 vmlinux
After:
text data bss dec hex filename
8994009 1016524 524652
10535185 a0c111 vmlinux
-------------------------------------------------------
-3376
All cases have been verified to generate tail-calls with and without
netfilter. The okfns in ipmr and xfrm4_input still remain inline because
gcc can't generate tail-calls for them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 15 Oct 2007 08:47:15 +0000 (01:47 -0700)]
[NET]: Avoid copying TCP packets unnecessarily
TCP packets all have writable heads, that is, even though it's cloned, it is
writable up to the end of the TCP header. This patch makes skb_checksum_help
aware of this fact by using skb_clone_writable and avoiding a copy for TCP.
I've also modified the BUG_ON tests to be unsigned. The only case where this
makes a difference is if csum_start points to a location before skb->data.
Since skb->data should always include the header where the checksum field
is (and all currently callers adhere to that), this change is safe and may
uncover bugs later.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 15 Oct 2007 08:46:08 +0000 (01:46 -0700)]
[NET]: Fix csum_start update in pskb_expand_head
I got confused by the dual nature of the off variable in the
function pskb_expand_head. The csum_start offset should use
nhead instead of off which can change depending on whether we
are using offsets or pointers.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Mon, 15 Oct 2007 08:42:31 +0000 (01:42 -0700)]
[NIU]: getting rid of __ucmpdi2 in niu.o
By the time we get to that switch by PHY type, we have 8bit
value. No need to keep it in u64 when u8 would do.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Juhl [Mon, 15 Oct 2007 08:39:12 +0000 (01:39 -0700)]
[NETLINK]: Don't leak 'listeners' in netlink_kernel_create()
The Coverity checker spotted that we'll leak the storage allocated
to 'listeners' in netlink_kernel_create() when the
if (!nl_table[unit].registered)
check is false.
This patch avoids the leak.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Mon, 15 Oct 2007 08:37:55 +0000 (01:37 -0700)]
[IPV6] __inet6_csk_dst_store(): fix check-after-use
The Coverity checker spotted that we have already oops'ed if "dst" was
NULL.
Since "dst" being NULL doesn't seem to be possible at this point this
patch removes the NULL check.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 15 Oct 2007 08:36:24 +0000 (01:36 -0700)]
[NIU]: Fix write past end of array in niu_pci_probe_sprom().
Noticed by Coverity checker and reported by Adrian Bunk.
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 15 Oct 2007 08:29:10 +0000 (01:29 -0700)]
[IPV6]: Avoid skb_copy/pskb_copy/skb_realloc_headroom on input
This patch replaces unnecessary uses of skb_copy by pskb_expand_head
on the IPv6 input path.
This allows us to remove the double pointers later.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 15 Oct 2007 08:28:47 +0000 (01:28 -0700)]
[IPV6]: Make ipv6_frag_rcv return the same packet
This patch implements the same change taht was done to ip_defrag. It
makes ipv6_frag_rcv return the last packet received of a train of fragments
rather than the head of that sequence.
This allows us to get rid of the sk_buff ** argument later.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 15 Oct 2007 07:53:15 +0000 (00:53 -0700)]
[NETFILTER]: Replace sk_buff ** with sk_buff *
With all the users of the double pointers removed, this patch mops up by
finally replacing all occurances of sk_buff ** in the netfilter API by
sk_buff *.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 14 Oct 2007 07:39:55 +0000 (00:39 -0700)]
[NETFILTER]: Avoid skb_copy/pskb_copy/skb_realloc_headroom
This patch replaces unnecessary uses of skb_copy, pskb_copy and
skb_realloc_headroom by functions such as skb_make_writable and
pskb_expand_head.
This allows us to remove the double pointers later.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 14 Oct 2007 07:39:33 +0000 (00:39 -0700)]
[IPVS]: Replace local version of skb_make_writable
This patch removes the IPVS-specific version of skb_make_writable and
replaces it with the netfilter one.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 14 Oct 2007 07:39:18 +0000 (00:39 -0700)]
[NETFILTER]: Do not copy skb in skb_make_writable
Now that all callers of netfilter can guarantee that the skb is not shared,
we no longer have to copy the skb in skb_make_writable.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 14 Oct 2007 07:39:01 +0000 (00:39 -0700)]
[BRIDGE]: Unshare skb upon entry
Due to the special location of the bridging hook, it should never see a
shared packet anyway (certainly not with any in-kernel code). So it
makes sense to unshare the skb there if necessary as that will greatly
simplify the code below it (in particular, netfilter).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 14 Oct 2007 07:38:47 +0000 (00:38 -0700)]
[NET]: Avoid unnecessary cloning for ingress filtering
As it is we always invoke pt_prev before ing_filter, even if there are no
ingress filters attached. This can cause unnecessary cloning in pt_prev.
This patch changes it so that we only invoke pt_prev if there are ingress
filters attached.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 14 Oct 2007 07:38:32 +0000 (00:38 -0700)]
[IPV4]: Change ip_defrag to return an integer
Now that ip_frag always returns the packet given to it on input, we can
change it to return an integer indicating error instead. This patch does
that and updates all its callers accordingly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 14 Oct 2007 07:38:15 +0000 (00:38 -0700)]
[IPV4]: Make ip_defrag return the same packet
This patch is a bit of a hack. However it is worth it if you consider that
this is the only reason why we have to carry around the struct sk_buff **
pointers in netfilter.
It makes ip_defrag always return the packet that was given to it on input.
It does this by cloning the packet and replacing its original contents with
the head fragment if necessary.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 14 Oct 2007 07:37:52 +0000 (00:37 -0700)]
[SKBUFF]: Add skb_morph
This patch creates a new function skb_morph that's just like skb_clone
except that it lets user provide the spare skb that will be overwritten
by the one that's to be cloned.
This will be used by IP fragment reassembly so that we get back the same
skb that went in last (rather than the head skb that we get now which
requires us to carry around double pointers all over the place).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 14 Oct 2007 07:37:30 +0000 (00:37 -0700)]
[SKBUFF]: Merge common code between copy_skb_header and skb_clone
This patch creates a new function __copy_skb_header to merge the common
code between copy_skb_header and skb_clone. Having two functions which
are largely the same is a source of wasted labour as well as confusion.
In fact the tc_verd stuff is almost certainly a bug since it's treated
differently in skb_clone compared to the callers of copy_skb_header
(skb_copy/pskb_copy/skb_copy_expand).
I've kept that difference in tact with a comment added asking for
clarification.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 15 Oct 2007 17:46:05 +0000 (10:46 -0700)]
Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
NFSv4: Fix a typo in nfs_inode_reclaim_delegation
NFS: Add a boot parameter to disable 64 bit inode numbers
NFS: nfs_refresh_inode should clear cache_validity flags on success
NFS: Fix a connectathon regression in NFSv3 and NFSv4
NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
SUNRPC: Don't call xprt_release in call refresh
SUNRPC: Don't call xprt_release() if call_allocate fails
SUNRPC: Fix buggy UDP transmission
[23/37] Clean up duplicate includes in
[2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
SUNRPC: Use correct type in buffer length calculations
SUNRPC: Fix default hostname created in rpc_create()
nfs: add server port to rpc_pipe info file
NFS: Get rid of some obsolete macros
NFS: Simplify filehandle revalidation
NFS: Ensure that nfs_link() returns a hashed dentry
NFS: Be strict about dentry revalidation when doing exclusive create
NFS: Don't zap the readdir caches upon error
NFS: Remove the redundant nfs_reval_fsid()
NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
...
Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c
Linus Torvalds [Mon, 15 Oct 2007 17:40:41 +0000 (10:40 -0700)]
Merge branch 'v2.6.24-lockdep' of git://git./linux/kernel/git/peterz/linux-2.6-lockdep
* 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep:
lockdep: annotate dir vs file i_mutex
lockdep: per filesystem inode lock class
lockdep: annotate kprobes irq fiddling
lockdep: annotate rcu_read_{,un}lock{,_bh}
lockdep: annotate journal_start()
lockdep: s390: connect the sysexit hook
lockdep: x86_64: connect the sysexit hook
lockdep: i386: connect the sysexit hook
lockdep: syscall exit check
lockdep: fixup mutex annotations
lockdep: fix mismatched lockdep_depth/curr_chain_hash
lockdep: Avoid /proc/lockdep & lock_stat infinite output
lockdep: maintainers
Linus Torvalds [Mon, 15 Oct 2007 16:57:54 +0000 (09:57 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] update sn2_defconfig
[IA64] Fix kernel hangup in kdump on INIT
[IA64] Fix kernel panic in kdump on INIT
[IA64] Remove vector from ia64_machine_kexec()
[IA64] Fix race when multiple cpus go through MCA
[IA64] Remove needless delay in MCA rendezvous
[IA64] add driver for ACPI methods to call native firmware
[IA64] abstract SAL_CALL wrapper to allow other firmware entry points
[IA64] perfmon: Remove exit_pfm_fs()
[IA64] tree-wide: Misc __cpu{initdata, init, exit} annotations
Linus Torvalds [Mon, 15 Oct 2007 16:07:58 +0000 (09:07 -0700)]
Get rid of unused variable warning in drivers/pci/hotplug/pci_hotplug_core.c
Commit
5a7ad7f044941316dc98eda2a087a12a7a50649d removed all uses of
'retval', but didn't remove the variable itself.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jes Sorensen [Wed, 19 Sep 2007 09:54:55 +0000 (11:54 +0200)]
[IA64] update sn2_defconfig
Update defonfig file for sn2 to match recent changes in config options.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Mon, 15 Oct 2007 15:22:16 +0000 (08:22 -0700)]
Merge git://git./linux/kernel/git/mingo/linux-2.6-sched
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (140 commits)
sched: sync wakeups preempt too
sched: affine sync wakeups
sched: guest CPU accounting: maintain guest state in KVM
sched: guest CPU accounting: maintain stats in account_system_time()
sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
sched: guest CPU accounting: add guest-CPU /proc/stat field
sched: domain sysctl fixes: add terminator comment
sched: domain sysctl fixes: do not crash on allocation failure
sched: domain sysctl fixes: unregister the sysctl table before domains
sched: domain sysctl fixes: use for_each_online_cpu()
sched: domain sysctl fixes: use kcalloc()
Make scheduler debug file operations const
sched: enable wake-idle on CONFIG_SCHED_MC=y
sched: reintroduce topology.h tunings
sched: allow the immediate migration of cache-cold tasks
sched: debug, improve migration statistics
sched: debug: increase width of debug line
sched: activate task_hot() only on fair-scheduled tasks
sched: reintroduce cache-hot affinity
sched: speed up context-switches a bit
...
Linus Torvalds [Mon, 15 Oct 2007 15:19:33 +0000 (08:19 -0700)]
Merge /pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (207 commits)
[SCSI] gdth: fix CONFIG_ISA build failure
[SCSI] esp_scsi: remove __dev{init,exit}
[SCSI] gdth: !use_sg cleanup and use of scsi accessors
[SCSI] gdth: Move members from SCp to gdth_cmndinfo, stage 2
[SCSI] gdth: Setup proper per-command private data
[SCSI] gdth: Remove gdth_ctr_tab[]
[SCSI] gdth: switch to modern scsi host registration
[SCSI] gdth: gdth_interrupt() gdth_get_status() & gdth_wait() fixes
[SCSI] gdth: clean up host private data
[SCSI] gdth: Remove virt hosts
[SCSI] gdth: Reorder scsi_host_template intitializers
[SCSI] gdth: kill gdth_{read,write}[bwl] wrappers
[SCSI] gdth: Remove 2.4.x support, in-kernel changelog
[SCSI] gdth: split out pci probing
[SCSI] gdth: split out eisa probing
[SCSI] gdth: split out isa probing
gdth: Make one abuse of scsi_cmnd less obvious
[SCSI] NCR5380: Use scsi_eh API for REQUEST_SENSE invocation
[SCSI] usb storage: use scsi_eh API in REQUEST_SENSE execution
[SCSI] scsi_error: Refactoring scsi_error to facilitate in synchronous REQUEST_SENSE
...
Linus Torvalds [Mon, 15 Oct 2007 15:18:44 +0000 (08:18 -0700)]
Merge branch 'agp-patches' of /linux/kernel/git/airlied/agp-2.6
* 'agp-patches' of master.kernel.org:/pub/scm/linux/kernel/git/airlied/agp-2.6:
fix use after free in amd create gatt pages
AGP fix race condition between unmapping and freeing pages
Linus Torvalds [Mon, 15 Oct 2007 15:17:26 +0000 (08:17 -0700)]
Merge branch 'drm-patches' of ssh:///linux/kernel/git/airlied/drm-2.6
* 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
via invalid device ids removal
radeon: Commit the ring after each partial texture upload blit.
i915: fix vbl swap allocation size.
drm: Replace DRM_IOCTL_ARGS with (dev, data, file_priv) and remove DRM_DEVICE.
drm: remove XFREE86_VERSION macros.
drm: Replace filp in ioctl arguments with drm_file *file_priv.
drm: Remove DRM_ERR OS macro.
Linus Torvalds [Mon, 15 Oct 2007 15:16:53 +0000 (08:16 -0700)]
Merge branch 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux
* 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux:
knfsd: query filesystem for NFSv4 getattr of FATTR4_MAXNAME
knfsd: nfsv4 delegation recall should take reference on client
knfsd: don't shutdown callbacks until nfsv4 client is freed
knfsd: let nfsd manage timing out its own leases
knfsd: Add source address to sunrpc svc errors
knfsd: 64 bit ino support for NFS server
svcgss: move init code into separate function
knfsd: remove code duplication in nfsd4_setclientid()
nfsd warning fix
knfsd: fix callback rpc cred
knfsd: move nfsv4 slab creation/destruction to module init/exit
knfsd: spawn kernel thread to probe callback channel
knfsd: nfs4 name->id mapping not correctly parsing negative downcall
knfsd: demote some printk()s to dprintk()s
knfsd: cleanup of nfsd4 cmp_* functions
knfsd: delete code made redundant by map_new_errors
nfsd: fix horrible indentation in nfsd_setattr
nfsd: remove unused cache_for_each macro
nfsd: tone down inaccurate dprintk
Geert Uytterhoeven [Mon, 15 Oct 2007 09:51:03 +0000 (11:51 +0200)]
PS3 system bus add_uevent_var() fallout
Kill unused variables
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Kosina [Mon, 15 Oct 2007 13:17:41 +0000 (15:17 +0200)]
HID: fix HIDIOCGRDESC memory access in hidraw
Fix bogus copying of data into userspace when HIDIOCGRDESC is issued.
HID-transport layer makes sure that dev->hid->rdesc is not larger than
HID_MAX_DESCRIPTOR_SIZE.
Noticed-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Mon, 15 Oct 2007 15:00:20 +0000 (17:00 +0200)]
sched: sync wakeups preempt too
make sure sync wakeups preempt too - the scheduler will not
overschedule as we've got various throttles against that.
As a result, sync wakeups can be used more widely in the kernel
(to signal wakeup affinity between tasks), and no arbitrary
latencies will be introduced either.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: affine sync wakeups
make sync wakeups affine for cache-cold tasks: if a cache-cold task
is woken up by a sync wakeup then use the opportunity to migrate it
straight away. (the two tasks are 'related' because they communicate)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: guest CPU accounting: maintain guest state in KVM
Modify KVM to update guest time accounting.
[ mingo@elte.hu: ported to 2.6.24 KVM. ]
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: guest CPU accounting: maintain stats in account_system_time()
modify account_system_time() to add cputime to cpustat->guest if we are
running a VCPU. We add this cputime to cpustat->user instead of
cpustat->system because this part of KVM code is in fact user code
although it is executed in the kernel. We duplicate VCPU time between
guest and user to allow an unmodified "top(1)" to display correct value.
A modified "top(1)" is able to display good cpu user time and cpu guest
time by subtracting cpu guest time from cpu user time. Update "gtime" in
task_struct accordingly.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct.
Modify /proc/<pid>/stat to display these new fields.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: guest CPU accounting: add guest-CPU /proc/stat field
as recent CPUs introduce a third running state, after "user" and
"system", we need a new field, "guest", in cpustat to store the time
used by the CPU to run virtual CPU. Modify /proc/stat to display this
new field.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: add terminator comment
we had an incorrect-terminator bug in sd_alloc_ctl_domain_table()
before, so add a comment that documents it.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: do not crash on allocation failure
Now that we are calling this at runtime, a more relaxed error path is
suggested. If an allocation fails, we just register the partial table,
which will show empty directories.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: unregister the sysctl table before domains
Unregister and free the sysctl table before destroying domains, then
rebuild and register after creating the new domains. This prevents the
sysctl table from pointing to freed memory for root to write.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: use for_each_online_cpu()
init_sched_domain_sysctl was walking cpus 0-n and referencing per_cpu
variables. If the cpus_possible mask is not contigious this will result
in a crash referencing unallocated data. If the online mask is not
contigious then we would show offline cpus and miss online ones.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: use kcalloc()
kcalloc checks for n * sizeof(element) overflows and it zeros.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
Make scheduler debug file operations const
In general, struct file_operations are const in the kernel, to not have
false cacheline sharing and to catch bugs at compiletime with accidental
writes to them. The new scheduler code introduces a new non-const one;
fix this up.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: enable wake-idle on CONFIG_SCHED_MC=y
most multicore CPUs today have shared L2 caches, so tune things so
that the spreading amongst cores is more aggressive.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: reintroduce topology.h tunings
reintroduce the 2.6.22 topology.h tunings again - they result in
slightly better balancing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: allow the immediate migration of cache-cold tasks
allow the immediate migration of cache-cold tasks.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: debug, improve migration statistics
add new migration statistics when SCHED_DEBUG and SCHEDSTATS
is enabled. Available in /proc/<PID>/sched.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: debug: increase width of debug line
increase width of debug line - in preparation of more debugging info.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: activate task_hot() only on fair-scheduled tasks
activate task_hot() only for fair-scheduled tasks (i.e. disable it
for RT tasks).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: reintroduce cache-hot affinity
reintroduce a simplified version of cache-hot/cold scheduling
affinity. This improves performance with certain SMP workloads,
such as sysbench.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: speed up context-switches a bit
speed up context-switches a bit by not clearing p->exec_start.
(as a side-effect, this also makes p->exec_start a universal timestamp
available to cache-hot estimations.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: do not wakeup-preempt with SCHED_BATCH tasks
do not wakeup-preempt with SCHED_BATCH tasks, their preemption
is batched too, driven by the tick.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: generate uevents for user creation/destruction
Generate uevents when a user is being created/destroyed. These events
can be used to configure cpu share of a new user.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: do not normalize kernel threads via SysRq-N
do not normalize kernel threads via SysRq-N: the migration threads,
softlockup threads, etc. might be essential for the system to
function properly. So only zap user tasks.
pointed out by Andi Kleen.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andi Kleen [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: remove stale comment from sched_group_set_shares()
remove stale comment from sched_group_set_shares().
Function never returns -EINVAL.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:15 +0000 (17:00 +0200)]
sched: clean up is_migration_thread()
clean up is_migration_thread() and turn it into an inline function.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andi Kleen [Mon, 15 Oct 2007 15:00:15 +0000 (17:00 +0200)]
sched: cleanup: refactor normalize_rt_tasks
Replace a particularly ugly ifdef with an inline and a new macro.
Also split up the function to be easier to read.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andi Kleen [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: cleanup: refactor common code of sleep_on / wait_for_completion
Refactor common code of sleep_on / wait_for_completion
These functions were largely cut'n'pasted. This moves
the common code into single helpers instead. Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andi Kleen [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: cleanup: remove unnecessary gotos
Replace loops implemented with gotos with real loops.
Replace err = ...; goto x; x: return err; with return ...;
No functional changes.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: update comment
update comment: clarify time-slices and remove obsolete tuning detail.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Galbraith [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: prevent wakeup over-scheduling
Prevent wakeup over-scheduling. Once a task has been preempted by a
task of the same or lower priority, it becomes ineligible for repeated
preemption by same until it has been ticked, or slept. Instead, the
task is marked for preemption at the next tick. Tasks of higher
priority still preempt immediately.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: disable forced preemption by default
Implement feature bit to disable forced preemption. This way
it can be checked whether a workload is overscheduling or not.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: fix group scheduling for SCHED_BATCH
The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
seems to break GROUP_SCHED. Although, it may be 'oops'-less due to the
possibility of 'p' being always a valid address.
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Zou Nan hai [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: some proc entries are missed in sched_domain sys_ctl debug code
cache_nice_tries and flags entry do not appear in proc fs sched_domain
directory, because ctl_table entry is skipped.
This patch fixes the issue.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Gautham R Shenoy [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: fix rt ptracer monopolizing CPU
yield() in wait_task_inactive(), can cause a high priority thread to be
scheduled back in, and there by loop forever while it is waiting for some
lower priority thread which is unfortunately still on the runqueue.
Use schedule_timeout_uninterruptible(1) instead.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Credit: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dhaval Giani [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: group scheduling, sysfs tunables
Add tunables in sysfs to modify a user's cpu share.
A directory is created in sysfs for each new user in the system.
/sys/kernel/uids/<uid>/cpu_share
Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.
Ex:
# cd /sys/kernel/uids/
# cat 512/cpu_share
1024
# echo 2048 > 512/cpu_share
# cat 512/cpu_share
2048
#
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: disable sleeper_fairness on SCHED_BATCH
disable sleeper fairness for batch tasks - they are about
batch processing after all.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: another wakeup_granularity fix
unit mis-match: wakeup_gran was used against a vruntime
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Paul E. McKenney [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: export cpu_clock()
export cpu_clock() - the preferred API instead of sched_clock().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: fix: move the CPU check into ->task_new_fair()
noticed by Peter Zijlstra:
fix: move the CPU check into ->task_new_fair(), this way we
can call place_entity() and get child ->vruntime right at
initial wakeup time.
(without this there can be large latencies)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Ingo Molnar [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: cleanup: function prototype cleanups
noticed by Thomas Gleixner:
cleanup: function prototype cleanups - move into single line
wherever possible.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: cleanup: rename task_grp to task_group
cleanup: rename task_grp to task_group. No need to save two characters
and 'grp' is annoying to read.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG
cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG, to
make SCHED_FEAT_ names more consistent.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: kfree(NULL) is valid
kfree(NULL) is valid.
pointed out by checkpatch.pl.
the fix shrinks the code a bit:
text data bss dec hex filename
40024 3842 100 43966 abbe sched.o.before
40002 3842 100 43944 aba8 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: style cleanup
fix up __setup() style bug - noticed via checkpatch.pl.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: break out if printing a warning in sched_domain_debug()
checkpatch.pl and Andy Whitcroft noticed the following bug: we did
not break out after printing an error.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: run sched_domain_debug() if CONFIG_SCHED_DEBUG=y
run sched_domain_debug() if CONFIG_SCHED_DEBUG=y, instead
of relying on the hand-crafted SCHED_DOMAIN_DEBUG switch.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Galbraith [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: cleanup, remove the TASK_NONINTERACTIVE flag
Here's another piece of low hanging obsolete fruit.
Remove obsolete TASK_NONINTERACTIVE.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: cleanup, make dequeue_entity() and update_stats_wait_end() similar
make dequeue_entity() / enqueue_entity() and update_stats_dequeue() /
update_stats_enqueue() look similar, structure-wise.
zero effect, functionality-wise:
text data bss dec hex filename
34550 3026 100 37676 932c sched.o.before
34550 3026 100 37676 932c sched.o.after
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: cleanup, remove calc_weighted()
remove obsolete code -- calc_weighted()
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: tidy up SCHED_RR
- make timeslices of SCHED_RR tasks constant and not
dependent on task's static_prio [1] ;
- remove obsolete code (timeslice related bits);
- make sched_rr_get_interval() return something more
meaningful [2] for SCHED_OTHER tasks.
[1] according to the following link, it's not compliant with SUSv3
(not sure though, what is the reference for us :-)
http://lkml.org/lkml/2007/3/7/656
[2] the interval is dynamic and can be depicted as follows "should a
task be one of the runnable tasks at this particular moment, it would
expect to run for this interval of time before being re-scheduled by the
scheduler tick".
(i.e. it's more precise if a task is runnable at the moment)
yeah, this seems to require task_rq_lock/unlock() but this is not a hot
path.
results:
(SCHED_FIFO)
dimm@earth:~/storage/prog$ sudo chrt -f 10 ./rr_interval
time_slice: 0 : 0
(SCHED_RR)
dimm@earth:~/storage/prog$ sudo chrt 10 ./rr_interval
time_slice: 0 :
99984800
(SCHED_NORMAL)
dimm@earth:~/storage/prog$ ./rr_interval
time_slice: 0 :
19996960
(SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should be a half of the previous result)
dimm@earth:~/storage/prog$ taskset 1 ./rr_interval
time_slice: 0 :
9998480
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Alexey Dobriyan [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: uninline scheduler
* save ~300 bytes
* activate_idle_task() was moved to avoid a warning
bloat-o-meter output:
add/remove: 6/0 grow/shrink: 0/16 up/down: 438/-733 (-295) <===
function old new delta
__enqueue_entity - 165 +165
finish_task_switch - 110 +110
update_curr_rt - 79 +79
__load_balance_iterator - 32 +32
__task_rq_unlock - 28 +28
find_process_by_pid - 24 +24
do_sched_setscheduler 133 123 -10
sys_sched_rr_get_interval 176 165 -11
sys_sched_getparam 156 145 -11
normalize_rt_tasks 482 470 -12
sched_getaffinity 112 99 -13
sys_sched_getscheduler 86 72 -14
sched_setaffinity 226 212 -14
sched_setscheduler 666 642 -24
load_balance_start_fair 33 9 -24
load_balance_next_fair 33 9 -24
dequeue_task_rt 133 67 -66
put_prev_task_rt 97 28 -69
schedule_tail 133 50 -83
schedule 682 594 -88
enqueue_entity 499 366 -133
task_new_fair 317 180 -137
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: tweak wakeup granularity
tweak wakeup granularity.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: optimize schedule() a bit on SMP
optimize schedule() a bit on SMP, by moving the rq-clock update
outside the rq lock.
code size is the same:
text data bss dec hex filename
25725 2666 96 28487 6f47 sched.o.before
25725 2666 96 28487 6f47 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: fix __pick_next_entity()
The thing is that __pick_next_entity() must never be called when
first_fair(cfs_rq) == NULL. It wouldn't be a problem, should 'run_node'
be the very first field of 'struct sched_entity' (and it's the second).
The 'nr_running != 0' check is _not_ enough, due to the fact that
'current' is not within the tree. Generic paths are ok (e.g. schedule()
as put_prev_task() is called previously)... I'm more worried about e.g.
migration_call() -> CPU_DEAD_FROZEN -> migrate_dead_tasks()... if
'current' == rq->idle, no problems.. if it's one of the SCHED_NORMAL
tasks (or imagine, some other use-cases in the future -- i.e. we should
not make outer world dependent on internal details of sched_fair class)
-- it may be "Houston, we've got a problem" case.
it's +16 bytes to the ".text". Another variant is to make 'run_node' the
first data member of 'struct sched_entity' but an additional check (se !
= NULL) is still needed in pick_next_entity().
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)]
sched: vslice fixups for non-0 nice levels
Make vslice accurate wrt nice levels, and add some comments
while we're at it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: whitespace cleanups
more whitespace cleanups. No code changed:
text data bss dec hex filename
26553 2790 288 29631 73bf sched.o.before
26553 2790 288 29631 73bf sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: mark scheduling classes as const
mark scheduling classes as const. The speeds up the code
a bit and shrinks it:
text data bss dec hex filename
40027 4018 292 44337 ad31 sched.o.before
40190 3842 292 44324 ad24 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: group scheduler, fix latency
There is a possibility that because of task of a group moving from one
cpu to another, it may gain more cpu time that desired. See
http://marc.info/?l=linux-kernel&m=
119073197730334 for details.
This is an attempt to fix that problem. Basically it simulates dequeue
of higher level entities as if they are going to sleep. Similarly it
simulate wakeup of higher level entities as if they are waking up from
sleep.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: group scheduler, fix bloat
Recent fix to check_preempt_wakeup() to check for preemption at higher
levels caused a size bloat for !CONFIG_FAIR_GROUP_SCHED.
Fix the problem.
42277 10598 320 53195 cfcb kernel/sched.o-before_this_patch
42216 10598 320 53134 cf8e kernel/sched.o-after_this_patch
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: group scheduler, fix coding style issues
Fix coding style issues reported by Randy Dunlap and others
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)]
sched: cleanup, remove stale comment
cleanup, remove stale comment.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>