Commit Graph

445 Commits

Author SHA1 Message Date
Christopher Berner
afe2e83206 Bump version to 1.7.0 2022-05-17 06:55:40 -07:00
Christopher Berner
a1e451349c Fix clippy warnings 2022-05-16 08:31:45 -07:00
Christopher Berner
9a47489160 Enable NEON optimized code path on aarch64 2022-05-16 08:31:45 -07:00
Christopher Berner
28136b2d39 Upgrade pyo3 dependency 2022-03-23 16:21:23 -07:00
Christopher Berner
d2ad552972 Bump version to 1.6.5 2022-02-07 19:10:45 -08:00
Christopher Berner
95286b9d0b Make extended_source_block_symbols unconditionally public 2022-02-07 19:10:01 -08:00
Christopher Berner
bd5a5ec6bd Update pyo3 to 0.15 2021-12-05 14:04:00 -08:00
Christopher Berner
dde61692ea Add benchmark numbers on Ryzen 5900X 2021-12-05 14:04:00 -08:00
Christopher Berner
98a9806801 Update to 2021 edition 2021-10-21 18:47:08 -07:00
Christopher Berner
49d4c83c6b Update readme with latest benchmarks on ARM 2021-10-16 18:11:46 -07:00
Christopher Berner
88959e05e6 Use vshrq_n_u8 in neon optimizations
Now that https://github.com/rust-lang/rust/issues/82072 is fixed this
intrinsic works and improves mulassign & FMA performance by ~30% on
Raspberry Pi 3 B+. End to end speedup is ~5%
2021-10-16 18:11:46 -07:00
Christopher Berner
1a4a62e64a Update README badges 2021-07-28 20:38:58 -07:00
Christopher Berner
d5f754747f Pin maturin version 2021-07-28 20:22:51 -07:00
Christopher Berner
8f885195d6 Bump version to 1.6.4 2021-07-28 19:43:19 -07:00
Christopher Berner
2e0befc8df Fix remaining Clippy warnings 2021-07-27 22:51:14 -07:00
Christopher Berner
39a26759cc Update pyo3 2021-07-27 22:51:14 -07:00
Christopher Berner
5a851083ed Fix some Clippy warnings 2021-07-27 22:00:38 -07:00
Christopher Berner
8b669faefd Fix panic in graph traversal
There was an off-by-one error in the initialization of storage for
connected components, such that if there were the same number of
connected components as nodes it would cause an index out of bounds
error
2021-07-27 21:46:46 -07:00
Christopher Berner
2b13544514 Fix warnings from cargo-deny config on 0.9.x 2021-06-06 16:11:45 -10:00
Christopher Berner
e027ef2af0 Rename .cargo/config to .cargo/config.toml
This matches the Cargo documentation https://doc.rust-lang.org/cargo/reference/config.html
and should fix cargo-deny's parsing
2021-03-28 09:49:20 -07:00
Christopher Berner
ee2407cb7c Add support for building Python bindings on M1 Mac 2021-03-28 09:49:20 -07:00
Christopher Berner
393b7096d9 Fix building Python bindings on Mac 2021-03-28 09:49:20 -07:00
Christopher Berner
c5e0188db3 Add CI for Mac 2021-03-28 09:49:20 -07:00
Christopher Berner
83101d6a7c Fix cargo test compilation error 2021-03-18 20:16:19 -07:00
Christopher Berner
c2a2e8a7c3 Bump version to 1.6.3 for release 2021-02-17 21:51:13 -08:00
Christopher Berner
c2e8a94a11 Update Raspberry Pi 3 B+ benchmarks
The previous benchmarks were run with a faulty power supply which
artificially lowered the clock speed
2021-02-15 16:21:31 -08:00
Christopher Berner
893e1c7c79 Optimize fma with GF2 with NEON
Improves performance by ~2x on very large symbol counts
2021-02-14 17:15:40 -08:00
Christopher Berner
8bbc99c5cd Update benchmarks on Raspberry Pi 2021-02-14 17:15:40 -08:00
Christopher Berner
e06af58ce2 Workaround bad compiler optimization with vshrq NEON instruction
The vshrq intrinsic incorrectly compiles to 16 single byte shift instructions.

This improves throughput by ~50%
2021-02-14 17:15:40 -08:00
Christopher Berner
a4db356932 Add NEON optimized mul_assign() function
Speeds up this op by ~2x
2021-02-14 17:15:40 -08:00
Christopher Berner
d0322d3ca4 Add NEON optimized FMA
Speeds up FMA by ~3x, and encoding throughput by ~50%
2021-02-14 17:15:40 -08:00
Christopher Berner
e3e9d6dcc2 Add NEON optimized implementation for octets::add_assign() 2021-02-14 17:15:40 -08:00
Christopher Berner
c1fa4e1f8e Add benchmarks on Raspberry Pi 3 B+ 2021-02-09 17:55:03 -08:00
Christopher Berner
63b2aec337 Fix 1:255 chance of test failure.
The fused FMA function doesn't allow a scalar of 1
2021-02-09 17:55:03 -08:00
Christopher Berner
c134e5b93e Fix panic on non-x86 platforms
This panic'ed because fused_addassign_mul_scalar does not support
scalar=1, and it was used as the fallback
2021-02-09 17:55:03 -08:00
Christopher Berner
30ed32e720 Update benchmarks
Note: pre-built plan benchmarks improved because I fixed the RAM config
on my computer to increase bandwidth, not because of code changes
2021-02-05 19:51:08 -08:00
Christopher Berner
562e64d438 Optimize column swapping substep for r > 1
Improves performance by ~4%
2021-02-05 19:51:08 -08:00
Christopher Berner
1241928a84 Optimize column swapping substep for r=1
Improves performance by ~1%
2021-02-05 19:51:08 -08:00
Christopher Berner
67a90ede4e Replace retain() with position() + swap_remove()
This improves performance by 1-2%
2021-02-05 19:51:08 -08:00
Christopher Berner
5e506b5b78 Merge .map().filter() into .filter_map() 2021-02-05 19:51:08 -08:00
Christopher Berner
a1d5894e25 Update benchmarks 2021-01-17 21:58:06 -08:00
Christopher Berner
7cfef09bc6 Don't eliminate sparse values from HDPC
These are never read, except for debugging, and this improves perf by ~1%
2021-01-17 21:24:08 -08:00
Christopher Berner
42c08b85c8 Remove unnecessary condition
This is always true, since we're in the r = 1 case
2021-01-17 21:24:08 -08:00
Christopher Berner
fa2064796c Fix some Clippy warnings 2021-01-17 15:53:44 -08:00
Christopher Berner
eb07e23208 Optimize HDPC generation with recursive calculation
Improves performance on large symbol counts by > 10%
2021-01-17 15:31:47 -08:00
Christopher Berner
f36bf73ca9 Reduce calls to rand() during HDPC generation
Small improvement to performance. Perhaps 1%
2021-01-17 13:09:52 -08:00
Christopher Berner
6546b714ad Skip elimination in V section of A during first phase
This is safe due to Errata 11, and speeds up performance by a couple
percent
2021-01-17 10:30:34 -08:00
Christopher Berner
905f78cfd0 Fix py_publish upload script 2021-01-15 19:42:47 -08:00
Christopher Berner
102ae0f7d6 Bump version to 1.6.2 for release 2021-01-14 21:40:46 -08:00
Christopher Berner
6d0f5e1b76 Build Python package with abi3 support
This allows the package to be used on any Python version >= 3.6
2021-01-14 21:33:19 -08:00