llama.cpp

mirror of https://git.adityakumar.xyz/llama.cpp.git synced 2024-11-09 15:29:43 +00:00

Author	SHA1	Message	Date
Stephan Walter	1b107b8550	ggml : generalize `quantize_fns` for simpler FP16 handling (#1237 ) * Generalize quantize_fns for simpler FP16 handling * Remove call to ggml_cuda_mul_mat_get_wsize * ci : disable FMA for mac os actions --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-05 19:13:06 +03:00
katsu560	a84ab1da8d	tests : fix quantize perf (#1990 ) * fix test quantize perf * avoid the global state	2023-06-26 19:47:02 +03:00
Borislav Stanimirov	9cbf50c041	build : fix and ignore MSVC warnings (#1889 )	2023-06-16 21:23:53 +03:00
Stephan Walter	c50b628810	Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122 )	2023-04-22 10:54:13 +00:00
unbounded	5f939498d5	ggml : unit test for quantization functions (#953 ) * Unit test for quantization functions Use the ggml_internal_get_quantize_fn function to loop through all quantization formats and run a sanity check on the result. Also add a microbenchmark that times these functions directly without running the rest of the GGML graph. * test-quantize-fns: CI fixes Fix issues uncovered in CI - need to use sizes divisible by 328 for loop unrolling - use intrinsic header that should work on Mac test-quantize: remove Per PR comment, subsumed by test-quantize-fns * test-quantize: fix for q8_0 intermediates	2023-04-22 12:10:39 +03:00