Commit Graph

343 Commits

Author SHA1 Message Date
Christopher Berner
82eb5dee14 Resolve 2.0 TODOs 2024-03-14 20:55:54 -07:00
Markus Legner
da79ac2ba5 Fix symbol IDs assigned to encoding packets
Previously, we incorrectly assigned internal symbol IDs (ISIs) to the
`PayloadId` of `EncodingPacket`s. According to RFC 6330, it should be
the encoding symbol IDs (ESIs). This commit fixes this inconsistency.

BREAKING CHANGE: As the assignment of symbol IDs changes, encoding
packets generated before this change cannot be decoded by the new
version and vice-versa.
2024-03-11 20:43:29 -07:00
Christopher Berner
1490c5a61f Fix alignment error on ARM 2024-03-04 19:15:53 -08:00
Christopher Berner
a3a0204585 Run clippy --fix 2023-11-26 11:32:44 -08:00
Christopher Berner
36d8fe89d2 Remove wasm support 2023-11-26 07:57:22 -08:00
Christopher Berner
cd1df04d92 Only set no_std when built without the std feature 2023-11-26 07:32:07 -08:00
Christopher Berner
7939144ef8 Fix division by zero when packet size is less than 32 2023-11-25 09:43:09 -08:00
Christopher Berner
eafdc58d0a Run cargo clippy --fix 2023-07-03 11:09:19 -07:00
Slesarew
5a720829fa
feat: support no_std (#143)
* feat: support no_std

`metal` feature supports `no_std` in configuration `default-features = false, features = ["metal"]`.
Float calculation is done via `micromath` crate.

All previously available functionality remains under default `std` feature.

Some tweaking of `python` and `wasm` features was done to compile tests.

* feat: get rid of floats (#2)

* feat: remove conversion to f64, fix features

* chore: uncomment symbols_required checker, fmt

* revert: add cdylib target for python support

* fix: generalize crate type

---------

Co-authored-by: varovainen <99664267+varovainen@users.noreply.github.com>
2023-02-02 18:07:41 -08:00
Pavel
02c80b595a
Added wasm build configuration (#136)
Co-authored-by: Christopher Berner <christopherberner@gmail.com>
2022-10-08 21:08:55 -07:00
Christopher Berner
a1e451349c Fix clippy warnings 2022-05-16 08:31:45 -07:00
Christopher Berner
9a47489160 Enable NEON optimized code path on aarch64 2022-05-16 08:31:45 -07:00
Christopher Berner
95286b9d0b Make extended_source_block_symbols unconditionally public 2022-02-07 19:10:01 -08:00
Christopher Berner
98a9806801 Update to 2021 edition 2021-10-21 18:47:08 -07:00
Christopher Berner
88959e05e6 Use vshrq_n_u8 in neon optimizations
Now that https://github.com/rust-lang/rust/issues/82072 is fixed this
intrinsic works and improves mulassign & FMA performance by ~30% on
Raspberry Pi 3 B+. End to end speedup is ~5%
2021-10-16 18:11:46 -07:00
Christopher Berner
2e0befc8df Fix remaining Clippy warnings 2021-07-27 22:51:14 -07:00
Christopher Berner
5a851083ed Fix some Clippy warnings 2021-07-27 22:00:38 -07:00
Christopher Berner
8b669faefd Fix panic in graph traversal
There was an off-by-one error in the initialization of storage for
connected components, such that if there were the same number of
connected components as nodes it would cause an index out of bounds
error
2021-07-27 21:46:46 -07:00
Christopher Berner
83101d6a7c Fix cargo test compilation error 2021-03-18 20:16:19 -07:00
Christopher Berner
893e1c7c79 Optimize fma with GF2 with NEON
Improves performance by ~2x on very large symbol counts
2021-02-14 17:15:40 -08:00
Christopher Berner
e06af58ce2 Workaround bad compiler optimization with vshrq NEON instruction
The vshrq intrinsic incorrectly compiles to 16 single byte shift instructions.

This improves throughput by ~50%
2021-02-14 17:15:40 -08:00
Christopher Berner
a4db356932 Add NEON optimized mul_assign() function
Speeds up this op by ~2x
2021-02-14 17:15:40 -08:00
Christopher Berner
d0322d3ca4 Add NEON optimized FMA
Speeds up FMA by ~3x, and encoding throughput by ~50%
2021-02-14 17:15:40 -08:00
Christopher Berner
e3e9d6dcc2 Add NEON optimized implementation for octets::add_assign() 2021-02-14 17:15:40 -08:00
Christopher Berner
63b2aec337 Fix 1:255 chance of test failure.
The fused FMA function doesn't allow a scalar of 1
2021-02-09 17:55:03 -08:00
Christopher Berner
c134e5b93e Fix panic on non-x86 platforms
This panic'ed because fused_addassign_mul_scalar does not support
scalar=1, and it was used as the fallback
2021-02-09 17:55:03 -08:00
Christopher Berner
562e64d438 Optimize column swapping substep for r > 1
Improves performance by ~4%
2021-02-05 19:51:08 -08:00
Christopher Berner
1241928a84 Optimize column swapping substep for r=1
Improves performance by ~1%
2021-02-05 19:51:08 -08:00
Christopher Berner
67a90ede4e Replace retain() with position() + swap_remove()
This improves performance by 1-2%
2021-02-05 19:51:08 -08:00
Christopher Berner
5e506b5b78 Merge .map().filter() into .filter_map() 2021-02-05 19:51:08 -08:00
Christopher Berner
7cfef09bc6 Don't eliminate sparse values from HDPC
These are never read, except for debugging, and this improves perf by ~1%
2021-01-17 21:24:08 -08:00
Christopher Berner
42c08b85c8 Remove unnecessary condition
This is always true, since we're in the r = 1 case
2021-01-17 21:24:08 -08:00
Christopher Berner
fa2064796c Fix some Clippy warnings 2021-01-17 15:53:44 -08:00
Christopher Berner
eb07e23208 Optimize HDPC generation with recursive calculation
Improves performance on large symbol counts by > 10%
2021-01-17 15:31:47 -08:00
Christopher Berner
f36bf73ca9 Reduce calls to rand() during HDPC generation
Small improvement to performance. Perhaps 1%
2021-01-17 13:09:52 -08:00
Christopher Berner
6546b714ad Skip elimination in V section of A during first phase
This is safe due to Errata 11, and speeds up performance by a couple
percent
2021-01-17 10:30:34 -08:00
Christopher Berner
24235dd213 Optimize first phase to call ones_in_column() only once for r = 1 case 2020-12-26 20:46:47 -08:00
Christopher Berner
602fc8711d Reduce length of merge chains in union-find data structure 2020-12-26 20:46:47 -08:00
Christopher Berner
3e88b065dd Optimize graph substep
Use a union-find data structure which is incrementally updated, instead
of always recomputing the entire graph

This improves performance by 5-10%
2020-12-26 10:05:53 -08:00
Christopher Berner
26c9c2f6a0 Remove eliminate_leading_value()
Also fix usage and semantics of selection helper .resize() method.
Previously, it said all values in first column had to be zero, but it
was called before those were eliminated
2020-12-26 10:05:53 -08:00
Christopher Berner
11d2de97f2 Update to rand 0.8 2020-12-19 13:14:12 -08:00
Christopher Berner
102c6a5a86 Optimize DenseBinaryMatrix
Switch to a single contiguous vector instead of vec of vecs

This improves performance by ~5%, especially for smaller symbol counts
2020-12-07 22:16:38 -08:00
Christopher Berner
7b0d1c5cff Remove X matrix from release builds
This improve performance on small symbol counts by ~5%
2020-12-06 15:36:43 -08:00
Christopher Berner
5c13e8de6e Further optimize fused_addassign_mul_scalar_binary_avx2()
Move calculation of control flags out of loop to avoid one OR
instruction inside loop. Also statically enable BMI1 and detect its
presence to ensure that BEXTR2 intrinsic is inlined

Improves performance by ~5% on small symbol counts
2020-12-06 15:36:43 -08:00
Christopher Berner
d87e46c625 Optimize DenseBinaryMatrix.swap_columns()
Improves performance by 10-15% for symbol count = 100
2020-12-06 08:15:21 -08:00
Christopher Berner
3a05d7be3e Add BinaryOctetVec
Improves encoding speed of large symbol counts by ~5%
2020-12-06 08:15:21 -08:00
Christopher Berner
50301e1b5b Optimize query_non_zero_columns()
This reduces the time spent in the fourth phase from ~6% of encoding
time to ~1%, according to perf, and improves overall throughput by 3-4%
on large symbol counts.
2020-11-29 09:51:56 -08:00
Christopher Berner
c4d227fba1 Optimize memory layout of dense U matrix
Previously we used column major ordering. Switch to row major to
optimize sequential access of rows which is much more common in the
first phase, and can also be used in the fourth phase

This improves performance by ~10% on large symbol counts
2020-11-28 21:08:35 -08:00
Christopher Berner
6245ab1c9a Fix over-allocation of memory for dense U section of matrix
The previous code had an off by one error leading to an extra word being
allocated for each row
2020-11-28 17:22:50 -08:00
Christopher Berner
3a4068a726 Fix typo in spelling of "access" 2020-11-28 17:22:50 -08:00