Implementation in blowfish-x86_64 uses 64bit rotations which are slow on P4,
making blowfish-x86_64 slower than generic C implementation. Therefore
blacklist P4.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Performance of twofish-x86_64-3way on Intel Pentium 4 and Atom is lower than
of twofish-x86_64 module. So blacklist these CPUs.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Since LRW & XTS are selected by serpent-sse2, we don't need these #ifdefs
anymore.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Since LRW & XTS are selected by twofish-x86_64-3way, we don't need these
#ifdefs anymore.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
LRW/XTS patches for serpent-sse2 forgot to add this. CRYPTO_TFM_REQ_MAY_SLEEP
should be cleared as sleeping between kernel_fpu_begin()/kernel_fpu_end() is
not allowed.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
These files aren't just exporting symbols -- they are also defining
a MODULE_LICENSE etc. so give them the full module.h file.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Patch adds 3-way parallel x86_64 assembly implementation of twofish as new
module. New assembler functions crypt data in three blocks chunks, improving
cipher performance on out-of-order CPUs.
Patch has been tested with tcrypt and automated filesystem tests.
Summary of the tcrypt benchmarks:
Twofish 3-way-asm vs twofish asm (128bit 8kb block ECB)
encrypt: 1.3x speed
decrypt: 1.3x speed
Twofish 3-way-asm vs twofish asm (128bit 8kb block CBC)
encrypt: 1.07x speed
decrypt: 1.4x speed
Twofish 3-way-asm vs twofish asm (128bit 8kb block CTR)
encrypt: 1.4x speed
Twofish 3-way-asm vs AES asm (128bit 8kb block ECB)
encrypt: 1.0x speed
decrypt: 1.0x speed
Twofish 3-way-asm vs AES asm (128bit 8kb block CBC)
encrypt: 0.84x speed
decrypt: 1.09x speed
Twofish 3-way-asm vs AES asm (128bit 8kb block CTR)
encrypt: 1.15x speed
Full output:
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-3way-asm-x86_64.txthttp://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-asm-x86_64.txthttp://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-aes-asm-x86_64.txt
Tests were run on:
vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1055T Processor
Also userspace test were run on:
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz
stepping : 11
Userspace test results:
Encryption/decryption of twofish 3-way vs x86_64-asm on AMD Phenom II:
encrypt: 1.27x
decrypt: 1.25x
Encryption/decryption of twofish 3-way vs x86_64-asm on Intel Xeon E7330:
encrypt: 1.36x
decrypt: 1.36x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This needed by 3-way twofish patch to be able to easily use one block
assembler functions. As glue code is shared between i586/x86_64 apply
change to i586 assembler too. Also export assembler functions for
3-way parallel twofish module.
CC: Joachim Fritschi <jfritschi@freenet.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds improved F-macro for 4-way parallel functions. With new
F-macro for 4-way parallel functions, blowfish sees ~15% improvement in
speed tests on AMD Phenom II (~5% on Intel Xeon E7330).
However when used in 1-way blowfish function new macro would be ~10%
slower than original, so old F-macro is kept for 1-way functions.
Patch cleans up old F-macro as it is no longer needed in 4-way part.
Patch also does register macro renaming to reduce stack usage.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Include <asm/aes.h> to pick up the declarations for crypto_aes_encrypt_x86
and crypto_aes_decrypt_x86 to quiet the sparse noise:
warning: symbol 'crypto_aes_encrypt_x86' was not declared. Should it be static?
warning: symbol 'crypto_aes_decrypt_x86' was not declared. Should it be static?
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Patch adds x86_64 assembly implementation of blowfish. Two set of assembler
functions are provided. First set is regular 'one-block at time'
encrypt/decrypt functions. Second is 'four-block at time' functions that
gain performance increase on out-of-order CPUs. Performance of 4-way
functions should be equal to 1-way functions with in-order CPUs.
Summary of the tcrypt benchmarks:
Blowfish assembler vs blowfish C (256bit 8kb block ECB)
encrypt: 2.2x speed
decrypt: 2.3x speed
Blowfish assembler vs blowfish C (256bit 8kb block CBC)
encrypt: 1.12x speed
decrypt: 2.5x speed
Blowfish assembler vs blowfish C (256bit 8kb block CTR)
encrypt: 2.5x speed
Full output:
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-asm-x86_64.txthttp://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-c-x86_64.txt
Tests were run on:
vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1055T Processor
stepping : 0
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).
Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.
Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.
With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.
Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).
Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:
frame size sha1-generic sha1-ssse3 delta
64 byte 37.5 MBit/s 37.5 MBit/s 0.0%
128 byte 56.3 MBit/s 62.5 MBit/s +11.0%
256 byte 87.5 MBit/s 100.0 MBit/s +14.3%
512 byte 131.3 MBit/s 150.0 MBit/s +14.2%
1024 byte 162.5 MBit/s 193.8 MBit/s +19.3%
1280 byte 175.0 MBit/s 212.5 MBit/s +21.4%
1420 byte 175.0 MBit/s 218.7 MBit/s +25.0%
1518 byte 150.0 MBit/s 181.2 MBit/s +20.8%
The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fix build error on i386 by moving function prototypes:
arch/x86/crypto/aesni-intel_glue.c: In function 'aesni_init':
arch/x86/crypto/aesni-intel_glue.c:1263: error: implicit declaration of function 'crypto_fpu_init'
arch/x86/crypto/aesni-intel_glue.c: In function 'aesni_exit':
arch/x86/crypto/aesni-intel_glue.c:1373: error: implicit declaration of function 'crypto_fpu_exit'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Loading fpu without aesni-intel does nothing. Loading aesni-intel
without fpu causes modes like xts to fail. (Unloading
aesni-intel will restore those modes.)
One solution would be to make aesni-intel depend on fpu, but it
seems cleaner to just combine the modules.
This is probably responsible for bugs like:
https://bugzilla.redhat.com/show_bug.cgi?id=589390
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes problem with packets that are not multiple of 64bytes.
Signed-off-by: Adrian Hoban <adrian.hoban@intel.com>
Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Flush TLB if PGD entry is changed in i386 PAE mode
x86, dumpstack: Correct stack dump info when frame pointer is available
x86: Clean up csum-copy_64.S a bit
x86: Fix common misspellings
x86: Fix misspelling and align params
x86: Use PentiumPro-optimized partial_csum() on VIA C7
They were generated by 'codespell' and then manually reviewed.
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: trivial@kernel.org
LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix up previous patch that failed to properly fix mem leak in
rfc4106_set_hash_subkey(). This add-on patch; fixes the leak. moves
kfree() out of the error path, returns -ENOMEM rather than -EINVAL when
ablkcipher_request_alloc() fails.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There's a small memory leak in
arch/x86/crypto/aesni-intel_glue.c::rfc4106_set_hash_subkey(). If the call
to kmalloc() fails and returns NULL then the memory allocated previously
by ablkcipher_request_alloc() is not freed when we leave the function.
I could have just added a call to ablkcipher_request_free() before we
return -ENOMEM, but that started to look too much like the code we
already had at the end of the function, so I chose instead to rework the
code a bit so that there are now a few labels at the end that we goto when
various allocations fail, so we don't have to repeat the same blocks of
code (this also reduces the object code size slightly).
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add missing header file:
arch/x86/crypto/ghash-clmulni-intel_glue.c:256: error: implicit declaration of function 'IS_ERR'
arch/x86/crypto/ghash-clmulni-intel_glue.c:257: error: implicit declaration of function 'PTR_ERR'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes the problem with 2.16 binutils.
Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com>
Signed-off-by: Adrian Hoban <adrian.hoban@intel.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Exclude AES-GCM code for x86-32 due to heavy usage of 64-bit registers
not available on x86-32.
While at it, fixed unregister order in aesni_exit().
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AES-NI instructions are also available in legacy mode so the 32-bit
architecture may profit from those, too.
To illustrate the performance gain here's a short summary of a dm-crypt
speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
implementations:
x86: i568 aes-ni delta
ECB, 256 bit: 93.8 MB/s 123.3 MB/s +31.4%
CBC, 256 bit: 84.8 MB/s 262.3 MB/s +209.3%
LRW, 256 bit: 108.6 MB/s 222.1 MB/s +104.5%
XTS, 256 bit: 105.0 MB/s 205.5 MB/s +95.7%
Additionally, due to some minor optimizations, the 64-bit version also
got a minor performance gain as seen below:
x86-64: old impl. new impl. delta
ECB, 256 bit: 121.1 MB/s 123.0 MB/s +1.5%
CBC, 256 bit: 285.3 MB/s 290.8 MB/s +1.9%
LRW, 256 bit: 263.7 MB/s 265.3 MB/s +0.6%
XTS, 256 bit: 251.1 MB/s 255.3 MB/s +1.7%
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds an optimized RFC4106 AES-GCM implementation for 64-bit
kernels. It supports 128-bit AES key size. This leverages the crypto
AEAD interface type to facilitate a combined AES & GCM operation to
be implemented in assembly code. The assembly code leverages Intel(R)
AES New Instructions and the PCLMULQDQ instruction.
Signed-off-by: Adrian Hoban <adrian.hoban@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com>
Signed-off-by: Erdinc Ozturk <erdinc.ozturk@intel.com>
Signed-off-by: James Guilford <james.guilford@intel.com>
Signed-off-by: Wajdi Feghali <wajdi.k.feghali@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Andrew Morton reported that AES-NI CTR optimization failed to compile
with gas 2.16.1, the error message is as follow:
arch/x86/crypto/aesni-intel_asm.S: Assembler messages:
arch/x86/crypto/aesni-intel_asm.S:752: Error: suffix or operands invalid for `movq'
arch/x86/crypto/aesni-intel_asm.S:753: Error: suffix or operands invalid for `movq'
To fix this, a gas macro is defined to assemble movq with 64bit
general purpose registers and XMM registers. The macro will generate
the raw .byte sequence for needed instructions.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
To take advantage of the hardware pipeline implementation of AES-NI
instructions. CTR mode cryption is implemented in ASM to schedule
multiple AES-NI instructions one after another. This way, some latency
of AES-NI instruction can be eliminated.
Performance testing based on dm-crypt should 50% reduction of
ecryption/decryption time.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Lbswap_mask, Lpoly and Ltwo_one should clearly belong to
.data section, not .text.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Old binutils do not support PCLMULQDQ-NI and PSHUFB, to make kernel
can be compiled by them, .byte code is used instead of assembly
instructions. But the readability and flexibility of raw .byte code is
not good.
So corresponding assembly instruction like gas macro is used instead.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Old binutils do not support AES-NI instructions, to make kernel can be
compiled by them, .byte code is used instead of AES-NI assembly
instructions. But the readability and flexibility of raw .byte code is
not good.
So corresponding assembly instruction like gas macro is used instead.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When renaming kernel_fpu_using to irq_fpu_usable, the semantics of the
function is changed too, from mesuring whether kernel is using FPU,
that is, the FPU is NOT available, to measuring whether FPU is usable,
that is, the FPU is available.
But the usage of irq_fpu_usable in ghash-clmulni-intel_glue.c is not
changed accordingly. This patch fixes this.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Old gases don't have a clue what pshufb stands for so we have
to hard-code it for now.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When renaming kernel_fpu_using to irq_fpu_usable, the semantics of the
function is changed too, from mesuring whether kernel is using FPU,
that is, the FPU is NOT available, to measuring whether FPU is usable,
that is, the FPU is available.
But the usage of irq_fpu_usable in aesni-intel_glue.c is not changed
accordingly. This patch fixes this.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
PCLMULQDQ is used to accelerate the most time-consuming part of GHASH,
carry-less multiplication. More information about PCLMULQDQ can be
found at:
http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/
Because PCLMULQDQ changes XMM state, its usage must be enclosed with
kernel_fpu_begin/end, which can be used only in process context, the
acceleration is implemented as crypto_ahash. That is, request in soft
IRQ context will be defered to the cryptd kernel thread.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>