llama.cpp

mirror of https://git.adityakumar.xyz/llama.cpp.git synced 2024-09-19 11:13:04 +00:00

Author	SHA1	Message	Date
Jiří Podivín	27ab66e437	py : turn verify-checksum-models.py into executable (#2245 ) README.md was adjusted to reflect the change. Signed-off-by: Jiri Podivin <jpodivin@gmail.com>	2023-07-16 22:54:47 +03:00
Georgi Gerganov	1b6efeab82	tests : fix test-grad0	2023-07-05 20:20:25 +03:00
Georgi Gerganov	ed9a54e512	ggml : sync latest (new ops, macros, refactoring) (#2106 ) - add ggml_argmax() - add ggml_tanh() - add ggml_elu() - refactor ggml_conv_1d() and variants - refactor ggml_conv_2d() and variants - add helper macros to reduce code duplication in ggml.c	2023-07-04 21:54:11 +03:00
Jiří Podivín	5ddf7ea1fb	hooks : setting up flake8 and pre-commit hooks (#1681 ) Small, non-functional changes were made to non-compliant files. These include breaking up long lines, whitespace sanitation and unused import removal. Maximum line length in python files was set to a generous 125 chars, in order to minimize number of changes needed in scripts and general annoyance. The "txt" prompts directory is excluded from the checks as it may contain oddly formatted files and strings for a good reason. Signed-off-by: Jiri Podivin <jpodivin@gmail.com>	2023-06-17 13:32:48 +03:00
Georgi Gerganov	b9fd7eee57	ggml : remove bit shuffling (#1405 ) * ggml : remove Q4_0 bit shufling (ARM NEON) * ggml : remove Q4_1 bit shuffling (ARM NEON + reference) * ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON) * ggml : remove Q4_2 bit shuffling (WIP, BROKEN) * ggml : remove Q5_0 bit shuffling (ARM NEON) * ggml : 2x faster scalar implementations * ggml : remove Q5_1 bit shuffling (ARM NEON + scalar) * ggml : simplify scalar dot * ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit * ggml : fix Q4_1 quantization * ggml : update cuBLAS + normalize variable names * ggml : remove Q4_2 mode * ggml : minor formatting * ggml : fix Q5_0 quantization * scripts : add script for measuring the time per token * AVX implementations (#1370) * ggml : uniform 5th bit extraction * llama : produce error upon loading old model files * llama : fix model magic/version write * ggml : speed-up Q5_0 + Q5_1 at 4 threads * ggml : preserve old Q4 and Q5 formats * ggml : simplify Q8_1 - no need for low / high sums anymore * ggml : fix Q8_0 and Q8_1 rounding * Revert "AVX implementations (#1370)" This reverts commit 948d124837f9d287d8490f41338e0e4cceb0814f. * ggml : fix AVX2 implementation * sha : update hashes for 7B and 13B * readme : update timings + remove warning banner * llama : update v2 PR number to 1405 * ggml : fix WASM comments * ggml : back to original bit order * readme : add note that Q4 and Q5 have been changed * llama : fix return for unknown version --------- Co-authored-by: Stephan Walter <stephan@walter.name>	2023-05-12 00:23:08 +03:00
Georgi Gerganov	f9a6364912	llama : require first token to be BOS (#1303 ) * llama : require first token to be BOS * scripts : add ppl-run-all.sh * perplexity : add BOS for each chunk * readme : update perplexity values after BOS fix * perplexity : add clarifying comments	2023-05-08 17:41:54 +03:00
Georgi Gerganov	bca9ad938a	minor : fix whitespaces (#1302 )	2023-05-03 20:09:42 +03:00
Georgi Gerganov	e2a937ca6a	minor : fix trailing whitespaces	2023-05-03 18:43:23 +03:00
KASR	b0c71c7b6d	scripts : platform independent script to verify sha256 checksums (#1203 ) * python script to verify the checksum of the llama models Added Python script for verifying SHA256 checksums of files in a directory, which can run on multiple platforms. Improved the formatting of the output results for better readability. * Update README.md update to the readme for improved readability and to explain the usage of the python checksum verification script * update the verification script I've extended the script based on suggestions by @prusnak The script now checks the available RAM, is there is enough to check the file at once it will do so. If not the file is read in chunks. * minor improvment small change so that the available ram is checked and not the total ram * remove the part of the code that reads the file at once if enough ram is available based on suggestions from @prusnak i removed the part of the code that checks whether the user had enough ram to read the entire model at once. the file is now always read in chunks. * Update verify-checksum-models.py quick fix to pass the git check	2023-05-03 18:31:28 +03:00
kuvaus	9daff419f6	fix build-info.h for git submodules (#1289 ) * make git build info work with submodules --------- Co-authored-by: Green Sky <green@g-s.xyz>	2023-05-03 02:43:43 +02:00
DannyDaemonic	f4cef87edf	Add git-based build information for better issue tracking (#1232 ) * Add git-based build information for better issue tracking * macOS fix * "build (hash)" and "CMAKE_SOURCE_DIR" changes * Redo "CMAKE_CURRENT_SOURCE_DIR" and clearer build messages * Fix conditional dependency on missing target * Broke out build-info.cmake, added find_package fallback, and added build into to all examples, added dependencies to Makefile * 4 space indenting for cmake, attempt to clean up my mess in Makefile * Short hash, less fancy Makefile, and don't modify build-info.h if it wouldn't change it	2023-05-01 18:23:47 +02:00
Georgi Gerganov	284685f169	scripts : add helper scripts to synch ggml repo	2023-04-23 19:57:09 +03:00

12 commits