llama.cpp

mirror of https://git.adityakumar.xyz/llama.cpp.git synced 2024-11-09 15:29:43 +00:00

Author	SHA1	Message	Date
Chad Brewbaker	917831c63a	readme : fix zig build instructions (#2171 )	2023-07-11 19:03:06 +03:00
Evan Miller	5656d10599	mpi : add support for distributed inference via MPI (#2099 ) * MPI support, first cut * fix warnings, update README * fixes * wrap includes * PR comments * Update CMakeLists.txt * Add GH workflow, fix test * Add info to README * mpi : trying to move more MPI stuff into ggml-mpi (WIP) (#2099) * mpi : add names for layer inputs + prep ggml_mpi_graph_compute() * mpi : move all MPI logic into ggml-mpi Not tested yet * mpi : various fixes - communication now works but results are wrong * mpi : fix output tensor after MPI compute (still not working) * mpi : fix inference * mpi : minor * Add OpenMPI to GH action * [mpi] continue-on-error: true * mpi : fix after master merge * [mpi] Link MPI C++ libraries to fix OpenMPI * tests : fix new llama_backend API * [mpi] use MPI_INT32_T * mpi : factor out recv / send in functions and reuse * mpi : extend API to allow usage with outer backends (e.g. Metal) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-10 18:49:56 +03:00
JackJollimore	18780e0a5e	readme : update Termux instructions (#2147 ) The file pathing is significant when running models inside of Termux on Android devices. llama.cpp performance is improved with loading a .bin from the $HOME directory.	2023-07-09 11:20:43 +03:00
rankaiyx	2492a53fd0	readme : add more docs indexes (#2127 ) * Update README.md to add more docs indexes * Update README.md to add more docs indexes	2023-07-09 10:38:42 +03:00
dylan	84525e7962	docker : add support for CUDA in docker (#1461 ) Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-07 21:25:25 +03:00
Judd	36680f6e40	convert : update for baichuan (#2081 ) 1. guess n_layers; 2. relax warnings on context size; 3. add a note that its derivations are also supported. Co-authored-by: Judd <foldl@boxvest.com>	2023-07-06 19:23:49 +03:00
Johannes Gäßler	924dd22fd3	Quantized dot products for CUDA mul mat vec (#2067 )	2023-07-05 14:19:42 +02:00
Georgi Gerganov	b472f3fca5	readme : add link web chat PR	2023-07-04 22:25:22 +03:00
Judd	471aab6e4c	convert : add support of baichuan-7b (#2055 ) Co-authored-by: Judd <foldl@boxvest.com>	2023-07-01 20:00:25 +03:00
Roman Parykin	d38e451578	readme : add Scala 3 bindings repo (#2010 )	2023-06-26 22:47:59 +03:00
Gustavo Rocha Dias	aa777abbb7	readme : LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux (#2007 ) * docs - Alternative way to build at Android, with CLBlast. * doc - LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux. * doc- fix typo	2023-06-26 22:34:45 +03:00
Georgi Gerganov	412c60e473	readme : add link to new k-quants for visibility	2023-06-26 19:45:09 +03:00
Georgi Gerganov	447ccbe8c3	readme : add new roadmap + manifesto	2023-06-25 16:08:12 +03:00
Georgi Gerganov	66a2555ba6	readme : add Azure CI discussion link	2023-06-25 09:07:03 +03:00
Georgi Gerganov	11da1a85cd	readme : fix whitespaces	2023-06-24 13:38:18 +03:00
Alberto	235b610d65	readme : fixed termux instructions (#1973 )	2023-06-24 13:32:13 +03:00
eiery	d7b7484f74	Add OpenLLaMA instructions to the README (#1954 ) * add openllama to readme	2023-06-23 10:38:01 +02:00
Rahul Vivek Nair	fb98254f99	Fix typo in README.md (#1961 )	2023-06-21 23:48:43 +02:00
Georgi Gerganov	049aa16b8c	readme : add link to p1	2023-06-20 19:05:54 +03:00
Xiake Sun	2322ec223a	Fix typo (#1949 )	2023-06-20 15:42:40 +03:00
Johannes Gäßler	16b9cd1939	Convert vector to f16 for dequantize mul mat vec (#1913 ) * Convert vector to f16 for dmmv * compile option * Added compilation option description to README * Changed cmake CUDA_ARCHITECTURES from "OFF" to "native"	2023-06-19 10:23:56 +02:00
Mike	e1886cf4fe	readme : update Android build instructions (#1922 ) Add steps for using termux on android devices to prevent common errors.	2023-06-18 11:28:26 +03:00
Johannes Gäßler	2c9380dd2f	Only one CUDA stream per device for async compute (#1898 )	2023-06-17 19:15:02 +02:00
Gustavo Rocha Dias	bac19927c3	readme : alternative way to build for Android with CLBlast. (#1828 )	2023-06-17 12:01:06 +03:00
Aisuko	059e99066d	doc : fix wrong address of BLIS.md (#1772 ) Signed-off-by: Aisuko <urakiny@gmail.com>	2023-06-10 17:08:11 +03:00
Georgi Gerganov	4dc62c545d	readme : add June roadmap	2023-06-07 07:15:08 +03:00
Yuval Peled	f4c55d3bd7	docs : add performance troubleshoot + example benchmark documentation (#1674 ) * test anchor link * test table * add benchmarks * Add performance troubleshoot & benchmark * add benchmarks * remove unneeded line --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-05 23:32:36 +03:00
Foul-Tarnished	f1465624c2	readme : fix typo (#1700 ) Fix a typo in a command in README.md	2023-06-05 23:28:37 +03:00
Georgi Gerganov	827f5eda91	readme : update hot topics	2023-06-04 23:38:19 +03:00
Georgi Gerganov	ecb217db4f	llama : Metal inference (#1642 ) * mtl : export the LLaMA computation graph * ci : disable temporary * mtl : adapt the MNIST example as starter * mtl : no need for mtl-export tool, add cli arg for main instead * mtl : export just a small part of the graph for now to make it easier * mtl : move MSL code into separate file for easy editing * mtl : initial get_rows_q4_0 kernel * mtl : confirmed get_rows_q4_0 is working correctly * mtl : add rms_norm kernel + confirm working * mtl : add mul kernel + confirm working * mtl : initial mul_mat Q4 kernel (wrong results) * mtl : mul_mat fixes (still wrong) * mtl : another mul_mat Q4 (still does not work) * mtl : working mul_mat q4 * ggml : fix handling of "view" ops in ggml_graph_import() * mtl : add rope kernel * mtl : add reshape and transpose handling * ggml : store offset as opt arg for ggml_view_xd() operators * mtl : add cpy kernel + handle view ops * mtl : confirm f16 x f32 attention mul mat * mtl : add scale kernel * mtl : add diag_mask_inf kernel * mtl : fix soft_max kernel * ggml : update ggml_nbytes() to handle non-contiguous tensors * mtl : verify V tensor contents * mtl : add f32 -> f32 cpy kernel * mtl : add silu kernel * mtl : add non-broadcast mul kernel * mtl : full GPU inference of the computation graph * mtl : optimize rms_norm and soft_max kernels * mtl : add f16 mat x f32 vec multiplication kernel * mtl : fix bug in f16 x f32 mul mat + speed-up computation * mtl : faster mul_mat_q4_0_f32 kernel * mtl : fix kernel signature + roll inner loop * mtl : more threads for rms_norm + better timing * mtl : remove printfs from inner loop * mtl : simplify implementation * mtl : add save/load vocab to ggml file * mtl : plug Metal inference into llama.cpp (very quick-n-dirty) * mtl : make it work with main example Lots of hacks but at least now it generates text * mtl : preparing for merge * mtl : clean-up ggml mtl interface + suport scratch / inplace * mtl : remove temp / debug code * metal : final refactoring and simplification * Revert "ci : disable temporary" This reverts commit 98c267fc77fe811082f672538fc91bcfc9072d63. * metal : add comments * metal : clean-up stuff, fix typos * readme : add Metal instructions * readme : add example for main	2023-06-04 23:34:30 +03:00
Henri Vasserman	d8bd0013e8	Add info about CUDA_VISIBLE_DEVICES (#1682 )	2023-06-03 16:35:20 +03:00
Henri Vasserman	97c9b77c4f	Add documentation about CLBlast (#1604 ) Installing, compiling and using.	2023-05-27 18:47:55 +03:00
Evan Jones	c31bbe934b	readme : add docs for chat-persistent.sh (#1568 ) * readme : add docs for chat-persistent.sh * Update README.md	2023-05-24 09:24:01 +03:00
Zenix	b8ee340abe	feature : support blis and other blas implementation (#1536 ) * feature: add blis support * feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927 * fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake * Fix typo in INTEGER Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix: blas changes on ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-20 17:58:31 +03:00
Georgi Gerganov	ea600071cb	Revert "feature : add blis and other BLAS implementation support (#1502 )" This reverts commit `07e9ace0f9`.	2023-05-20 12:03:48 +03:00
Zenix	07e9ace0f9	feature : add blis and other BLAS implementation support (#1502 ) * feature: add blis support * feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927 * fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake * Fix typo in INTEGER Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-20 12:02:48 +03:00
Georgi Gerganov	2d5db48371	ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508 ) * ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0 * llama : bump LLAMA_FILE_VERSION to 3 * cuda : update Q4 and Q8 dequantize kernels * ggml : fix AVX dot products * readme : update performance table + hot topics	2023-05-19 22:17:18 +03:00
David Kennedy	79e3efb0e9	readme : adds WizardLM to the list of supported models (#1485 )	2023-05-19 20:16:30 +03:00
Georgi Gerganov	cdd5350892	readme : update Q4_0 perplexities I think these were affected by the removal of the `round` during quantization	2023-05-13 09:12:44 +03:00
Rinne	089b1c93ba	readme : add C#/.NET bindings repo (#1409 )	2023-05-12 08:39:40 +03:00
Georgi Gerganov	b9fd7eee57	ggml : remove bit shuffling (#1405 ) * ggml : remove Q4_0 bit shufling (ARM NEON) * ggml : remove Q4_1 bit shuffling (ARM NEON + reference) * ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON) * ggml : remove Q4_2 bit shuffling (WIP, BROKEN) * ggml : remove Q5_0 bit shuffling (ARM NEON) * ggml : 2x faster scalar implementations * ggml : remove Q5_1 bit shuffling (ARM NEON + scalar) * ggml : simplify scalar dot * ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit * ggml : fix Q4_1 quantization * ggml : update cuBLAS + normalize variable names * ggml : remove Q4_2 mode * ggml : minor formatting * ggml : fix Q5_0 quantization * scripts : add script for measuring the time per token * AVX implementations (#1370) * ggml : uniform 5th bit extraction * llama : produce error upon loading old model files * llama : fix model magic/version write * ggml : speed-up Q5_0 + Q5_1 at 4 threads * ggml : preserve old Q4 and Q5 formats * ggml : simplify Q8_1 - no need for low / high sums anymore * ggml : fix Q8_0 and Q8_1 rounding * Revert "AVX implementations (#1370)" This reverts commit 948d124837f9d287d8490f41338e0e4cceb0814f. * ggml : fix AVX2 implementation * sha : update hashes for 7B and 13B * readme : update timings + remove warning banner * llama : update v2 PR number to 1405 * ggml : fix WASM comments * ggml : back to original bit order * readme : add note that Q4 and Q5 have been changed * llama : fix return for unknown version --------- Co-authored-by: Stephan Walter <stephan@walter.name>	2023-05-12 00:23:08 +03:00
Georgi Gerganov	56551bc11f	readme : add notice about upcoming breaking change	2023-05-08 22:52:18 +03:00
AlpinDale	fe60904eef	readme : add TOC and Pygmalion instructions (#1359 )	2023-05-08 19:33:30 +03:00
Georgi Gerganov	f9a6364912	llama : require first token to be BOS (#1303 ) * llama : require first token to be BOS * scripts : add ppl-run-all.sh * perplexity : add BOS for each chunk * readme : update perplexity values after BOS fix * perplexity : add clarifying comments	2023-05-08 17:41:54 +03:00
Johannes Gäßler	1f48b0abcf	Documented CUDA reproducibility, added warning (#1346 )	2023-05-08 02:42:01 +02:00
DaniAndTheWeb	173d0e6419	makefile: automatic Arch Linux detection (#1332 ) This commit is a port of a detection method used in koboldcpp's Makefile in order to automatically set the -lcblas option on Arch Linux	2023-05-05 23:57:14 +02:00
Pavol Rusnak	921dcee00a	readme: add missing info (#1324 )	2023-05-05 16:43:36 +02:00
44670	360cfe5bec	readme : add OpenBuddy link (#1321 )	2023-05-04 19:33:31 +03:00
Georgi Gerganov	bca9ad938a	minor : fix whitespaces (#1302 )	2023-05-03 20:09:42 +03:00
KASR	b0c71c7b6d	scripts : platform independent script to verify sha256 checksums (#1203 ) * python script to verify the checksum of the llama models Added Python script for verifying SHA256 checksums of files in a directory, which can run on multiple platforms. Improved the formatting of the output results for better readability. * Update README.md update to the readme for improved readability and to explain the usage of the python checksum verification script * update the verification script I've extended the script based on suggestions by @prusnak The script now checks the available RAM, is there is enough to check the file at once it will do so. If not the file is read in chunks. * minor improvment small change so that the available ram is checked and not the total ram * remove the part of the code that reads the file at once if enough ram is available based on suggestions from @prusnak i removed the part of the code that checks whether the user had enough ram to read the entire model at once. the file is now always read in chunks. * Update verify-checksum-models.py quick fix to pass the git check	2023-05-03 18:31:28 +03:00

1 2 3

147 commits