raptorq

mirror of https://github.com/cberner/raptorq.git synced 2024-06-27 09:19:02 +00:00

Author	SHA1	Message	Date
Christopher Berner	e027ef2af0	Rename .cargo/config to .cargo/config.toml This matches the Cargo documentation https://doc.rust-lang.org/cargo/reference/config.html and should fix cargo-deny's parsing	2021-03-28 09:49:20 -07:00
Christopher Berner	ee2407cb7c	Add support for building Python bindings on M1 Mac	2021-03-28 09:49:20 -07:00
Christopher Berner	393b7096d9	Fix building Python bindings on Mac	2021-03-28 09:49:20 -07:00
Christopher Berner	c5e0188db3	Add CI for Mac	2021-03-28 09:49:20 -07:00
Christopher Berner	83101d6a7c	Fix cargo test compilation error	2021-03-18 20:16:19 -07:00
Christopher Berner	c2a2e8a7c3	Bump version to 1.6.3 for release	2021-02-17 21:51:13 -08:00
Christopher Berner	c2e8a94a11	Update Raspberry Pi 3 B+ benchmarks The previous benchmarks were run with a faulty power supply which artificially lowered the clock speed	2021-02-15 16:21:31 -08:00
Christopher Berner	893e1c7c79	Optimize fma with GF2 with NEON Improves performance by ~2x on very large symbol counts	2021-02-14 17:15:40 -08:00
Christopher Berner	8bbc99c5cd	Update benchmarks on Raspberry Pi	2021-02-14 17:15:40 -08:00
Christopher Berner	e06af58ce2	Workaround bad compiler optimization with vshrq NEON instruction The vshrq intrinsic incorrectly compiles to 16 single byte shift instructions. This improves throughput by ~50%	2021-02-14 17:15:40 -08:00
Christopher Berner	a4db356932	Add NEON optimized mul_assign() function Speeds up this op by ~2x	2021-02-14 17:15:40 -08:00
Christopher Berner	d0322d3ca4	Add NEON optimized FMA Speeds up FMA by ~3x, and encoding throughput by ~50%	2021-02-14 17:15:40 -08:00
Christopher Berner	e3e9d6dcc2	Add NEON optimized implementation for octets::add_assign()	2021-02-14 17:15:40 -08:00
Christopher Berner	c1fa4e1f8e	Add benchmarks on Raspberry Pi 3 B+	2021-02-09 17:55:03 -08:00
Christopher Berner	63b2aec337	Fix 1:255 chance of test failure. The fused FMA function doesn't allow a scalar of 1	2021-02-09 17:55:03 -08:00
Christopher Berner	c134e5b93e	Fix panic on non-x86 platforms This panic'ed because fused_addassign_mul_scalar does not support scalar=1, and it was used as the fallback	2021-02-09 17:55:03 -08:00
Christopher Berner	30ed32e720	Update benchmarks Note: pre-built plan benchmarks improved because I fixed the RAM config on my computer to increase bandwidth, not because of code changes	2021-02-05 19:51:08 -08:00
Christopher Berner	562e64d438	Optimize column swapping substep for r > 1 Improves performance by ~4%	2021-02-05 19:51:08 -08:00
Christopher Berner	1241928a84	Optimize column swapping substep for r=1 Improves performance by ~1%	2021-02-05 19:51:08 -08:00
Christopher Berner	67a90ede4e	Replace retain() with position() + swap_remove() This improves performance by 1-2%	2021-02-05 19:51:08 -08:00
Christopher Berner	5e506b5b78	Merge .map().filter() into .filter_map()	2021-02-05 19:51:08 -08:00
Christopher Berner	a1d5894e25	Update benchmarks	2021-01-17 21:58:06 -08:00
Christopher Berner	7cfef09bc6	Don't eliminate sparse values from HDPC These are never read, except for debugging, and this improves perf by ~1%	2021-01-17 21:24:08 -08:00
Christopher Berner	42c08b85c8	Remove unnecessary condition This is always true, since we're in the r = 1 case	2021-01-17 21:24:08 -08:00
Christopher Berner	fa2064796c	Fix some Clippy warnings	2021-01-17 15:53:44 -08:00
Christopher Berner	eb07e23208	Optimize HDPC generation with recursive calculation Improves performance on large symbol counts by > 10%	2021-01-17 15:31:47 -08:00
Christopher Berner	f36bf73ca9	Reduce calls to rand() during HDPC generation Small improvement to performance. Perhaps 1%	2021-01-17 13:09:52 -08:00
Christopher Berner	6546b714ad	Skip elimination in V section of A during first phase This is safe due to Errata 11, and speeds up performance by a couple percent	2021-01-17 10:30:34 -08:00
Christopher Berner	905f78cfd0	Fix py_publish upload script	2021-01-15 19:42:47 -08:00
Christopher Berner	102ae0f7d6	Bump version to 1.6.2 for release	2021-01-14 21:40:46 -08:00
Christopher Berner	6d0f5e1b76	Build Python package with abi3 support This allows the package to be used on any Python version >= 3.6	2021-01-14 21:33:19 -08:00
Christopher Berner	24235dd213	Optimize first phase to call ones_in_column() only once for r = 1 case	2020-12-26 20:46:47 -08:00
Christopher Berner	602fc8711d	Reduce length of merge chains in union-find data structure	2020-12-26 20:46:47 -08:00
Christopher Berner	3cc21f5b42	Update readme with new benchmarks	2020-12-26 10:22:47 -08:00
Christopher Berner	3e88b065dd	Optimize graph substep Use a union-find data structure which is incrementally updated, instead of always recomputing the entire graph This improves performance by 5-10%	2020-12-26 10:05:53 -08:00
Christopher Berner	81d7cbbc65	Fix ordered list numbering in Errata document	2020-12-26 10:05:53 -08:00
Christopher Berner	26c9c2f6a0	Remove eliminate_leading_value() Also fix usage and semantics of selection helper .resize() method. Previously, it said all values in first column had to be zero, but it was called before those were eliminated	2020-12-26 10:05:53 -08:00
Christopher Berner	a0b06313de	Fix compilation error in codec_benchmark from rand upgrade	2020-12-26 10:05:53 -08:00
Christopher Berner	11d2de97f2	Update to rand 0.8	2020-12-19 13:14:12 -08:00
Christopher Berner	02ca60d2d4	Bump version to 1.6.1 for release	2020-12-09 10:24:16 -08:00
Christopher Berner	102c6a5a86	Optimize DenseBinaryMatrix Switch to a single contiguous vector instead of vec of vecs This improves performance by ~5%, especially for smaller symbol counts	2020-12-07 22:16:38 -08:00
Christopher Berner	7b0d1c5cff	Remove X matrix from release builds This improve performance on small symbol counts by ~5%	2020-12-06 15:36:43 -08:00
Christopher Berner	5c13e8de6e	Further optimize fused_addassign_mul_scalar_binary_avx2() Move calculation of control flags out of loop to avoid one OR instruction inside loop. Also statically enable BMI1 and detect its presence to ensure that BEXTR2 intrinsic is inlined Improves performance by ~5% on small symbol counts	2020-12-06 15:36:43 -08:00
Christopher Berner	b4d4cfd273	Disable lto to fix perf debug symbol mapping	2020-12-06 15:36:43 -08:00
Christopher Berner	d87e46c625	Optimize DenseBinaryMatrix.swap_columns() Improves performance by 10-15% for symbol count = 100	2020-12-06 08:15:21 -08:00
Christopher Berner	3a05d7be3e	Add BinaryOctetVec Improves encoding speed of large symbol counts by ~5%	2020-12-06 08:15:21 -08:00
Christopher Berner	50301e1b5b	Optimize query_non_zero_columns() This reduces the time spent in the fourth phase from ~6% of encoding time to ~1%, according to perf, and improves overall throughput by 3-4% on large symbol counts.	2020-11-29 09:51:56 -08:00
Christopher Berner	c4d227fba1	Optimize memory layout of dense U matrix Previously we used column major ordering. Switch to row major to optimize sequential access of rows which is much more common in the first phase, and can also be used in the fourth phase This improves performance by ~10% on large symbol counts	2020-11-28 21:08:35 -08:00
Christopher Berner	6245ab1c9a	Fix over-allocation of memory for dense U section of matrix The previous code had an off by one error leading to an extra word being allocated for each row	2020-11-28 17:22:50 -08:00
Christopher Berner	3a4068a726	Fix typo in spelling of "access"	2020-11-28 17:22:50 -08:00

1 2 3 4 5 ...

476 Commits