Christopher Berner
e06af58ce2
Workaround bad compiler optimization with vshrq NEON instruction
...
The vshrq intrinsic incorrectly compiles to 16 single byte shift instructions.
This improves throughput by ~50%
2021-02-14 17:15:40 -08:00
Christopher Berner
a4db356932
Add NEON optimized mul_assign() function
...
Speeds up this op by ~2x
2021-02-14 17:15:40 -08:00
Christopher Berner
d0322d3ca4
Add NEON optimized FMA
...
Speeds up FMA by ~3x, and encoding throughput by ~50%
2021-02-14 17:15:40 -08:00
Christopher Berner
e3e9d6dcc2
Add NEON optimized implementation for octets::add_assign()
2021-02-14 17:15:40 -08:00
Christopher Berner
63b2aec337
Fix 1:255 chance of test failure.
...
The fused FMA function doesn't allow a scalar of 1
2021-02-09 17:55:03 -08:00
Christopher Berner
c134e5b93e
Fix panic on non-x86 platforms
...
This panic'ed because fused_addassign_mul_scalar does not support
scalar=1, and it was used as the fallback
2021-02-09 17:55:03 -08:00
Christopher Berner
562e64d438
Optimize column swapping substep for r > 1
...
Improves performance by ~4%
2021-02-05 19:51:08 -08:00
Christopher Berner
1241928a84
Optimize column swapping substep for r=1
...
Improves performance by ~1%
2021-02-05 19:51:08 -08:00
Christopher Berner
67a90ede4e
Replace retain() with position() + swap_remove()
...
This improves performance by 1-2%
2021-02-05 19:51:08 -08:00
Christopher Berner
5e506b5b78
Merge .map().filter() into .filter_map()
2021-02-05 19:51:08 -08:00
Christopher Berner
7cfef09bc6
Don't eliminate sparse values from HDPC
...
These are never read, except for debugging, and this improves perf by ~1%
2021-01-17 21:24:08 -08:00
Christopher Berner
42c08b85c8
Remove unnecessary condition
...
This is always true, since we're in the r = 1 case
2021-01-17 21:24:08 -08:00
Christopher Berner
fa2064796c
Fix some Clippy warnings
2021-01-17 15:53:44 -08:00
Christopher Berner
eb07e23208
Optimize HDPC generation with recursive calculation
...
Improves performance on large symbol counts by > 10%
2021-01-17 15:31:47 -08:00
Christopher Berner
f36bf73ca9
Reduce calls to rand() during HDPC generation
...
Small improvement to performance. Perhaps 1%
2021-01-17 13:09:52 -08:00
Christopher Berner
6546b714ad
Skip elimination in V section of A during first phase
...
This is safe due to Errata 11, and speeds up performance by a couple
percent
2021-01-17 10:30:34 -08:00
Christopher Berner
24235dd213
Optimize first phase to call ones_in_column() only once for r = 1 case
2020-12-26 20:46:47 -08:00
Christopher Berner
602fc8711d
Reduce length of merge chains in union-find data structure
2020-12-26 20:46:47 -08:00
Christopher Berner
3e88b065dd
Optimize graph substep
...
Use a union-find data structure which is incrementally updated, instead
of always recomputing the entire graph
This improves performance by 5-10%
2020-12-26 10:05:53 -08:00
Christopher Berner
26c9c2f6a0
Remove eliminate_leading_value()
...
Also fix usage and semantics of selection helper .resize() method.
Previously, it said all values in first column had to be zero, but it
was called before those were eliminated
2020-12-26 10:05:53 -08:00
Christopher Berner
11d2de97f2
Update to rand 0.8
2020-12-19 13:14:12 -08:00
Christopher Berner
102c6a5a86
Optimize DenseBinaryMatrix
...
Switch to a single contiguous vector instead of vec of vecs
This improves performance by ~5%, especially for smaller symbol counts
2020-12-07 22:16:38 -08:00
Christopher Berner
7b0d1c5cff
Remove X matrix from release builds
...
This improve performance on small symbol counts by ~5%
2020-12-06 15:36:43 -08:00
Christopher Berner
5c13e8de6e
Further optimize fused_addassign_mul_scalar_binary_avx2()
...
Move calculation of control flags out of loop to avoid one OR
instruction inside loop. Also statically enable BMI1 and detect its
presence to ensure that BEXTR2 intrinsic is inlined
Improves performance by ~5% on small symbol counts
2020-12-06 15:36:43 -08:00
Christopher Berner
d87e46c625
Optimize DenseBinaryMatrix.swap_columns()
...
Improves performance by 10-15% for symbol count = 100
2020-12-06 08:15:21 -08:00
Christopher Berner
3a05d7be3e
Add BinaryOctetVec
...
Improves encoding speed of large symbol counts by ~5%
2020-12-06 08:15:21 -08:00
Christopher Berner
50301e1b5b
Optimize query_non_zero_columns()
...
This reduces the time spent in the fourth phase from ~6% of encoding
time to ~1%, according to perf, and improves overall throughput by 3-4%
on large symbol counts.
2020-11-29 09:51:56 -08:00
Christopher Berner
c4d227fba1
Optimize memory layout of dense U matrix
...
Previously we used column major ordering. Switch to row major to
optimize sequential access of rows which is much more common in the
first phase, and can also be used in the fourth phase
This improves performance by ~10% on large symbol counts
2020-11-28 21:08:35 -08:00
Christopher Berner
6245ab1c9a
Fix over-allocation of memory for dense U section of matrix
...
The previous code had an off by one error leading to an extra word being
allocated for each row
2020-11-28 17:22:50 -08:00
Christopher Berner
3a4068a726
Fix typo in spelling of "access"
2020-11-28 17:22:50 -08:00
Christopher Berner
9a849add9b
Optimize vector creation in get_sub_row_as_octets()
...
Improves performance by ~5%
2020-11-28 17:22:50 -08:00
Christopher Berner
8b462f5c83
Optimize processing of U matrix
...
Optimize Phases 2-5 to avoid writes to the first i columns.
Additionally, use pre-computed ops from first phase to implement third &
fifth phases
This improves encoding performance by ~15%, especially on large symbol
counts
2020-11-27 21:36:17 -08:00
Christopher Berner
a96272a0c7
Remove useless cfg guard
2020-11-23 20:00:42 -08:00
AnthonyMikh
4a6ddf1c26
Avoid bounds checking in loop
...
Slicing `src` checks bounds only once instead of on every iteration of loop.
2020-10-26 19:33:34 -07:00
Christopher Berner
4bf46ec16b
Upgrade primal and pyo3 dependencies
2020-10-24 11:26:33 -07:00
Jonathan Nilsson
ab75fc1b6d
OIT is copy
2020-10-22 22:49:59 -07:00
Jonathan Nilsson
a81ca51f41
I need access to the partition function for my decoder and i want to create a encoder from a ObjectTransmissionInformation
2020-10-22 22:49:59 -07:00
Jonathan Nilsson
a788b14bac
Remove some clones and removed some allocations
2020-10-16 22:03:36 -07:00
Christopher Berner
95b6b5ae91
Make serde support optional
2020-08-30 09:39:39 -07:00
Christopher Berner
6330f94c4c
Simplify multiplication table initialization
...
Replace unrolled loops with const fn while which is new in Rust 1.46
2020-08-29 21:23:40 -07:00
Christopher Berner
f8240da5b5
Fix incorrect symbol calculation assertion
2020-06-23 21:07:16 -07:00
Christopher Berner
dca2ad8b7c
Fix Clippy warnings
2020-06-23 20:50:03 -07:00
Christopher Berner
e08c78a800
Avoid allocating excess memory
2020-05-07 10:16:43 -07:00
Christopher Berner
48a9dcc2c0
Remove dead code
2020-05-07 10:16:43 -07:00
Christopher Berner
97aa0b5003
Fix crash in Decoder when decoding large numbers of blocks
2020-03-28 14:53:34 -07:00
Christopher Berner
88a9d6d582
Fix Clippy style warning
2020-03-20 23:32:45 -07:00
Christopher Berner
69246f50b1
Add public function to calculate object to block splits
2020-03-20 22:58:09 -07:00
Christopher Berner
7847099cd7
Fix source block numbering with uneven blocks
...
This fixes a critical bug where blocks with ids after ZL, see section
4.4.1.2. in RFC, were incorrectly numbered during encoding
2020-03-14 08:38:52 -07:00
Christopher Berner
e12085c195
Add assertation from RFC to parameter calculation
2020-03-14 08:38:52 -07:00
Christopher Berner
329598c48b
Implement sub block support
2020-02-25 22:50:29 -08:00