{"id":5525,"date":"2025-03-05T12:10:47","date_gmt":"2025-03-05T04:10:47","guid":{"rendered":"https:\/\/cathayvista.top\/?p=5525"},"modified":"2025-03-05T13:17:23","modified_gmt":"2025-03-05T05:17:23","slug":"large-language-model-llm-zhen","status":"publish","type":"post","link":"https:\/\/cathayvista.top\/index.php\/2025\/03\/05\/large-language-model-llm-zhen\/","title":{"rendered":"\u5927\u8bed\u8a00\u6a21\u578b \/ Large language model (LLM) &#8211; \u4e2d\u82f1\u6587\u7ef4\u57fa\u767e\u79d1\u8bcd\u6761\u878d\u5408"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"has-small-font-size\">\u4e2d\u6587\u8bcd\u6761\u539f\u6587\u94fe\u63a5\uff08\u65e0\u6cd5\u4ece\u4e2d\u56fd\u5185\u5730\u8bbf\u95ee\uff09\uff1a<a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B\" target=\"_blank\" rel=\"noreferrer noopener\">\u8bf7\u70b9\u51fb\u8fd9\u91cc\u8bbf\u95ee<\/a> <br>\u82f1\u6587\u8bcd\u6761\u539f\u6587\u94fe\u63a5\uff08\u65e0\u6cd5\u4ece\u4e2d\u56fd\u5185\u5730\u8bbf\u95ee\uff09\uff1a<a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model\" target=\"_blank\" rel=\"noreferrer noopener\">\u8bf7\u70b9\u51fb\u8fd9\u91cc\u8bbf\u95ee<\/a><br>\u672c\u6587\u57fa\u4e8e\u82f1\u6587\u8bcd\u6761\u7684\u7ebf\u7d22\uff0c\u5e76\u8865\u5145\u90e8\u5206\u6765\u81ea\u4e2d\u6587\u8bcd\u6761\u7684\u5185\u5bb9\uff08\u5728\u4e8c\u8005\u51b2\u7a81\u65f6\uff0c\u4ee5\u66f4\u665a\u66f4\u65b0\u8005\u4e3a\u51c6\uff09\u3002 \u8fbd\u89c2\u642c\u8fd0\u65f6\u8fdb\u884c\u4e86\u5fc5\u8981\u7684\u5408\u89c4\u5316\u5904\u7406\uff0c\u4ee5\u4f7f\u5176\u80fd\u591f\u5728\u4e2d\u56fd\u5185\u5730\u4e0a\u4f20\u3002\u90e8\u5206\u6587\u5b57\u91c7\u7528<strong>\u6c49\u8bed\u62fc\u97f3<\/strong>\u65b9\u5f0f\u4ee3\u66ff\uff0c\u97f3\u8282\u540e\u7684\u6570\u5b57\u8868\u793a\u6c49\u8bed\u62fc\u97f3\u89c4\u5219\u4e2d\u7684\u58f0\u8c03\u3002 <\/p>\n\n\n\n<p class=\"has-small-font-size\">\u5173\u4e8e\u8fbd\u89c2\u7684\u7ef4\u57fa\u767e\u79d1\u642c\u8fd0\u8ba1\u5212\uff0c\u53ca\u5176\u4ed6\u5df2\u642c\u8fd0\u7684\u8bcd\u6761\uff0c<a href=\"https:\/\/cathayvista.top\/index.php\/wikipedia-intro\/\" data-type=\"link\" data-id=\"https:\/\/cathayvista.top\/index.php\/wikipedia-intro\/\" target=\"_blank\" rel=\"noreferrer noopener\">\u8bf7\u70b9\u51fb\u8fd9\u91cc\u4e86\u89e3\u66f4\u591a<\/a>\u3002\u7ef4\u57fa\u767e\u79d1\uff08Wikipedia\uff09\u662f\u7f8e\u56fd\u7ef4\u57fa\u5a92\u4f53\u57fa\u91d1\u4f1a\u7684\u4e92\u8054\u7f51\u767e\u79d1\u9879\u76ee\uff0c\u5176\u5185\u5bb9\u53ef\u80fd\u53d7\u5230\u7acb\u573a\u3001\u4fe1\u606f\u6765\u6e90\u7b49\u56e0\u7d20\u5f71\u54cd\uff0c\u8bf7\u5ba2\u89c2\u770b\u5f85\u3002\u6b63\u6587\u5185\u5bb9\u4e0d\u4ee3\u8868\u8bd1\u8005\u89c2\u70b9\u3002 <\/p>\n\n\n\n<p class=\"has-small-font-size\">\u8fbd\u89c2\u63d0\u4f9b\u7684\u7ffb\u8bd1\u4ec5\u4f9b\u53c2\u8003\u3002<strong>\u6587\u4e2d\u53ef\u80fd\u5305\u542b\u65e0\u6cd5\u4ece\u4e2d\u56fd\u5185\u5730\u8bbf\u95ee\u7684\u94fe\u63a5\u3002<\/strong> <\/p>\n<cite>\u8fbd\u89c2\u6240\u642c\u8fd0\u7684\u8bcd\u6761\u6587\u672c\u4e0e\u7ef4\u57fa\u767e\u79d1\u4e00\u9053\u540c\u6837\u9075\u5faaCC BY-SA 4.0\u534f\u8bae\uff08<a href=\"https:\/\/zhuanlan.zhihu.com\/p\/653887754?utm_psn=1730141199812366337\" target=\"_blank\" rel=\"noreferrer noopener\">\u8fbd\u89c2\u642c\u8fd0\u7684\u4e2d\u82f1\u6587\u5bf9\u7167\u7248\u672c<\/a>\uff09\uff0c\u5728\u7b26\u5408\u534f\u8bae\u8981\u6c42\u7684\u60c5\u51b5\u4e0b\u60a8\u53ef\u4ee5\u514d\u8d39\u4f7f\u7528\u5176\u5185\u5bb9\uff08\u5305\u62ec\u5546\u7528\uff09\u3002\u56fe\u7247\u548c\u89c6\u9891\u53ef\u80fd\u9075\u5faa\u4e0d\u540c\u7684\u5171\u4eab\u534f\u8bae\u3002<a href=\"https:\/\/zhuanlan.zhihu.com\/p\/666846485?utm_psn=1730141316690960386\" target=\"_blank\" rel=\"noreferrer noopener\">\u8bf7\u70b9\u51fb\u8fd9\u91cc\u8bbf\u95ee<\/a><\/cite><\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">1. \u6b63\u6587\uff08\u53d1\u5e03\u4e8e\u77e5\u4e4e\u4e13\u680f\uff09<\/h2>\n\n\n\n<p><a href=\"https:\/\/zhuanlan.zhihu.com\/p\/28069013106\">\u8bf7\u70b9\u51fb\u8fd9\u91cc\u8bbf\u95ee<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"\u53c3\u898b\">2. \u53c2\u89c1\uff08\u7ef4\u57fa\u767e\u79d1\u7684\u76f8\u5173\u8bcd\u6761\uff09| See also<\/h2>\n\n\n\n<ul class=\"wp-block-list has-small-font-size\">\n<li><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E8%81%8A%E5%A4%A9%E6%A9%9F%E5%99%A8%E4%BA%BA\">\u804a\u5929\u673a\u5668\u4eba<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B\">\u8bed\u8a00\u6a21\u578b<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/zh.wikipedia.org\/wiki\/GPT-4\">GPT-4 \uff08OpenAI)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/zh.wikipedia.org\/wiki\/LLaMA\">LLaMA\uff08Meta\uff09<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%B0%8D%E8%A9%B1%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80%E6%A8%A1%E5%9E%8B\">LaMDA(\u8c37\u6b4c)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/zh.wikipedia.org\/wiki\/Gemini_(%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B)\">Gemini(\u8c37\u6b4c)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Foundation_models\">Foundation models<\/a>\u3010\u57fa\u7840\u6a21\u578b\u3011<\/li>\n\n\n\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/List_of_large_language_models\">List of large language models<\/a>\u3010\u5927\u8bed\u8a00\u6a21\u578b\u5217\u8868\u3011<\/li>\n\n\n\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/List_of_chatbots\">List of chatbots<\/a>\u3010\u804a\u5929\u673a\u5668\u4eba\u5217\u8868\u3011<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"\u5916\u90e8\u8fde\u63a5\">3. \u82f1\u6587\u8bcd\u6761\u53c2\u8003\u6587\u732e | References<\/h2>\n\n\n\n<ol class=\"wp-block-list has-small-font-size\">\n<li>^\u00a0\u00a0Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (Dec 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (eds.).\u00a0<a href=\"https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf\">&#8220;Language Models are Few-Shot Learners&#8221;<\/a>\u00a0(PDF).\u00a0<em>Advances in Neural Information Processing Systems<\/em>.\u00a0<strong>33<\/strong>. Curran Associates, Inc.:\u00a01877\u20131901.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20231117204007\/https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf\">Archived<\/a>\u00a0(PDF)\u00a0from the original on 2023-11-17. Retrieved\u00a02023-03-14.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-2\">^<\/a><\/strong>\u00a0Fathallah, Nadeen; Das, Arunav; De Giorgis, Stefano; Poltronieri, Andrea; Haase, Peter; Kovriguina, Liubov (2024-05-26).\u00a0<a href=\"https:\/\/2024.eswc-conferences.org\/wp-content\/uploads\/2024\/05\/77770034.pdf\"><em>NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning<\/em><\/a>\u00a0(PDF). Extended Semantic Web Conference 2024. Hersonissos, Greece.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-Manning-2022_3-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Christopher_D._Manning\">Manning, Christopher D.<\/a>\u00a0(2022).\u00a0<a href=\"https:\/\/www.amacad.org\/publication\/human-language-understanding-reasoning\">&#8220;Human Language Understanding &amp; Reasoning&#8221;<\/a>.\u00a0<em>Daedalus<\/em>.\u00a0<strong>151<\/strong>\u00a0(2):\u00a0127\u2013138.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1162%2Fdaed_a_01905\">10.1162\/daed_a_01905<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/S2CID_(identifier)\">S2CID<\/a>\u00a0<a href=\"https:\/\/api.semanticscholar.org\/CorpusID:248377870\">248377870<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20231117205531\/https:\/\/www.amacad.org\/publication\/human-language-understanding-reasoning\">Archived<\/a>\u00a0from the original on 2023-11-17. Retrieved\u00a02023-03-09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-4\">^<\/a><\/strong>\u00a0Goodman, Joshua (2001-08-09),\u00a0<em>A Bit of Progress in Language Modeling<\/em>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/cs\/0108005\">cs\/0108005<\/a>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Bibcode_(identifier)\">Bibcode<\/a>:<a href=\"https:\/\/ui.adsabs.harvard.edu\/abs\/2001cs........8005G\">2001cs&#8230;&#8230;..8005G<\/a><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-5\">^<\/a><\/strong>\u00a0Kilgarriff, Adam; Grefenstette, Gregory (September 2003).\u00a0<a href=\"https:\/\/direct.mit.edu\/coli\/article\/29\/3\/333-347\/1816\">&#8220;Introduction to the Special Issue on the Web as Corpus&#8221;<\/a>.\u00a0<em>Computational Linguistics<\/em>.\u00a0<strong>29<\/strong>\u00a0(3):\u00a0333\u2013347.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1162%2F089120103322711569\">10.1162\/089120103322711569<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISSN_(identifier)\">ISSN<\/a>\u00a0<a href=\"https:\/\/search.worldcat.org\/issn\/0891-2017\">0891-2017<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-6\">^<\/a><\/strong>\u00a0Banko, Michele; Brill, Eric (2001).\u00a0<a href=\"https:\/\/dx.doi.org\/10.3115\/1073012.1073017\">&#8220;Scaling to very very large corpora for natural language disambiguation&#8221;<\/a>.\u00a0<em>Proceedings of the 39th Annual Meeting on Association for Computational Linguistics &#8211; ACL &#8217;01<\/em>. Morristown, NJ, USA: Association for Computational Linguistics:\u00a026\u201333.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.3115%2F1073012.1073017\">10.3115\/1073012.1073017<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-7\">^<\/a><\/strong>\u00a0Resnik, Philip; Smith, Noah A. (September 2003).\u00a0<a href=\"https:\/\/direct.mit.edu\/coli\/article\/29\/3\/349-380\/1809\">&#8220;The Web as a Parallel Corpus&#8221;<\/a>.\u00a0<em>Computational Linguistics<\/em>.\u00a0<strong>29<\/strong>\u00a0(3):\u00a0349\u2013380.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1162%2F089120103322711578\">10.1162\/089120103322711578<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISSN_(identifier)\">ISSN<\/a>\u00a0<a href=\"https:\/\/search.worldcat.org\/issn\/0891-2017\">0891-2017<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240607172811\/https:\/\/direct.mit.edu\/coli\/article\/29\/3\/349-380\/1809\">Archived<\/a>\u00a0from the original on 2024-06-07. Retrieved\u00a02024-06-07.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-8\">^<\/a><\/strong>\u00a0Halevy, Alon; Norvig, Peter; Pereira, Fernando (March 2009).\u00a0<a href=\"https:\/\/ieeexplore.ieee.org\/document\/4804817\">&#8220;The Unreasonable Effectiveness of Data&#8221;<\/a>.\u00a0<em>IEEE Intelligent Systems<\/em>.\u00a0<strong>24<\/strong>\u00a0(2):\u00a08\u201312.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1109%2FMIS.2009.36\">10.1109\/MIS.2009.36<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISSN_(identifier)\">ISSN<\/a>\u00a0<a href=\"https:\/\/search.worldcat.org\/issn\/1541-1672\">1541-1672<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-9\">^<\/a><\/strong>\u00a0Chen, Leiyu; Li, Shaobo; Bai, Qiang; Yang, Jing; Jiang, Sanlong; Miao, Yanming (2021).\u00a0<a href=\"https:\/\/doi.org\/10.3390%2Frs13224712\">&#8220;Review of Image Classification Algorithms Based on Convolutional Neural Networks&#8221;<\/a>.\u00a0<em>Remote Sensing<\/em>.\u00a0<strong>13<\/strong>\u00a0(22): 4712.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Bibcode_(identifier)\">Bibcode<\/a>:<a href=\"https:\/\/ui.adsabs.harvard.edu\/abs\/2021RemS...13.4712C\">2021RemS&#8230;13.4712C<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.3390%2Frs13224712\">10.3390\/rs13224712<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-10\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Ashish_Vaswani\">Vaswani, Ashish<\/a>; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion;\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Aidan_Gomez\">Gomez, Aidan N<\/a>; Kaiser, \u0141ukasz; Polosukhin, Illia (2017).\u00a0<a href=\"https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf\">&#8220;Attention is All you Need&#8221;<\/a>\u00a0(PDF).\u00a0<em>Advances in Neural Information Processing Systems<\/em>.\u00a0<strong>30<\/strong>. Curran Associates, Inc.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240221141113\/https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf\">Archived<\/a>\u00a0(PDF)\u00a0from the original on 2024-02-21. Retrieved\u00a02024-01-21.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-11\">^<\/a><\/strong>\u00a0Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua (2014). &#8220;Neural Machine Translation by Jointly Learning to Align and Translate&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/1409.0473\">1409.0473<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-12\">^<\/a><\/strong>\u00a0Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020).\u00a0<a href=\"https:\/\/aclanthology.org\/2020.tacl-1.54\">&#8220;A Primer in BERTology: What We Know About How BERT Works&#8221;<\/a>.\u00a0<em>Transactions of the Association for Computational Linguistics<\/em>.\u00a0<strong>8<\/strong>:\u00a0842\u2013866.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2002.12327\">2002.12327<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1162%2Ftacl_a_00349\">10.1162\/tacl_a_00349<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/S2CID_(identifier)\">S2CID<\/a>\u00a0<a href=\"https:\/\/api.semanticscholar.org\/CorpusID:211532403\">211532403<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20220403103310\/https:\/\/aclanthology.org\/2020.tacl-1.54\/\">Archived<\/a>\u00a0from the original on 2022-04-03. Retrieved\u00a02024-01-21.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-13\">^<\/a><\/strong>\u00a0Movva, Rajiv; Balachandar, Sidhika; Peng, Kenny; Agostini, Gabriel; Garg, Nikhil; Pierson, Emma (2024).\u00a0<a href=\"https:\/\/aclanthology.org\/2024.naacl-long.67\">&#8220;Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers&#8221;<\/a>.\u00a0<em>Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)<\/em>. pp.\u00a01223\u20131243.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2307.10700\">2307.10700<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.18653%2Fv1%2F2024.naacl-long.67\">10.18653\/v1\/2024.naacl-long.67<\/a>. Retrieved\u00a02024-12-08.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-14\">^<\/a><\/strong>\u00a0Hern, Alex (14 February 2019).\u00a0<a href=\"https:\/\/www.theguardian.com\/technology\/2019\/feb\/14\/elon-musk-backed-ai-writes-convincing-news-fiction\">&#8220;New AI fake text generator may be too dangerous to release, say creators&#8221;<\/a>.\u00a0<em><a href=\"https:\/\/en.wikipedia.org\/wiki\/The_Guardian\">The Guardian<\/a><\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20190214173112\/https:\/\/www.theguardian.com\/technology\/2019\/feb\/14\/elon-musk-backed-ai-writes-convincing-news-fiction\">Archived<\/a>\u00a0from the original on 14 February 2019. Retrieved\u00a020 January\u00a02024.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-15\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.euronews.com\/next\/2023\/11\/30\/chatgpt-a-year-on-3-ways-the-ai-chatbot-has-completely-changed-the-world-in-12-months\">&#8220;ChatGPT a year on: 3 ways the AI chatbot has completely changed the world in 12 months&#8221;<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Euronews\">Euronews<\/a>. November 30, 2023.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240114025250\/https:\/\/www.euronews.com\/next\/2023\/11\/30\/chatgpt-a-year-on-3-ways-the-ai-chatbot-has-completely-changed-the-world-in-12-months\">Archived<\/a>\u00a0from the original on January 14, 2024. Retrieved\u00a0January 20,\u00a02024.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-16\">^<\/a><\/strong>\u00a0Heaven, Will (March 14, 2023).\u00a0<a href=\"https:\/\/www.technologyreview.com\/2023\/03\/14\/1069823\/gpt-4-is-bigger-and-better-chatgpt-openai\/\">&#8220;GPT-4 is bigger and better than ChatGPT\u2014but OpenAI won&#8217;t say why&#8221;<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/MIT_Technology_Review\">MIT Technology Review<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230317224201\/https:\/\/www.technologyreview.com\/2023\/03\/14\/1069823\/gpt-4-is-bigger-and-better-chatgpt-openai\/\">Archived<\/a>\u00a0from the original on March 17, 2023. Retrieved\u00a0January 20,\u00a02024.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-17\">^<\/a><\/strong>\u00a0Movva, Rajiv; Balachandar, Sidhika; Peng, Kenny; Agostini, Gabriel; Garg, Nikhil; Pierson, Emma (2024).\u00a0<a href=\"https:\/\/aclanthology.org\/2024.naacl-long.67\">&#8220;Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers&#8221;<\/a>.\u00a0<em>Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)<\/em>. pp.\u00a01223\u20131243.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2307.10700\">2307.10700<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.18653%2Fv1%2F2024.naacl-long.67\">10.18653\/v1\/2024.naacl-long.67<\/a>. Retrieved\u00a02024-12-08.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-18\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/ourworldindata.org\/grapher\/artificial-intelligence-parameter-count?time=2017-09-05..latest\">&#8220;Parameters in notable artificial intelligence systems&#8221;<\/a>.\u00a0<em>ourworldindata.org<\/em>. November 30, 2023. Retrieved\u00a0January 20,\u00a02024.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-19\">^<\/a><\/strong>\u00a0Sharma, Shubham (2025-01-20).\u00a0<a href=\"https:\/\/venturebeat.com\/ai\/open-source-deepseek-r1-uses-pure-reinforcement-learning-to-match-openai-o1-at-95-less-cost\/\">&#8220;Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 \u2014 at 95% less cost&#8221;<\/a>.\u00a0<em>VentureBeat<\/em>. Retrieved\u00a02025-01-26.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-20\">^<\/a><\/strong>\u00a0Zia, Dr Tehseen (2024-01-08).\u00a0<a href=\"https:\/\/www.unite.ai\/unveiling-of-large-multimodal-models-shaping-the-landscape-of-language-models-in-2024\/\">&#8220;Unveiling of Large Multimodal Models: Shaping the Landscape of Language Models in 2024&#8221;<\/a>.\u00a0<em>Unite.AI<\/em>. Retrieved\u00a02024-12-28.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-21\">^<\/a><\/strong>\u00a0Peng, Bo; et\u00a0al. (2023). &#8220;RWKV: Reinventing RNNS for the Transformer Era&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2305.13048\">2305.13048<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-22\">^<\/a><\/strong>\u00a0Merritt, Rick (2022-03-25).\u00a0<a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/03\/25\/what-is-a-transformer-model\/\">&#8220;What Is a Transformer Model?&#8221;<\/a>.\u00a0<em>NVIDIA Blog<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20231117203924\/https:\/\/blogs.nvidia.com\/blog\/what-is-a-transformer-model\/\">Archived<\/a>\u00a0from the original on 2023-11-17. Retrieved\u00a02023-07-25.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-23\">^<\/a><\/strong>\u00a0Gu, Albert; Dao, Tri (2023-12-01),\u00a0<em>Mamba: Linear-Time Sequence Modeling with Selective State Spaces<\/em>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2312.00752\">2312.00752<\/a><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-24\">^<\/a><\/strong>\u00a0Kaushal, Ayush; Mahowald, Kyle (2022-06-06),\u00a0<em>What do tokens know about their characters and how do they know it?<\/em>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2206.02608\">2206.02608<\/a><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-25\">^<\/a><\/strong>\u00a0Yennie Jun (2023-05-03).\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230817165705\/https:\/\/blog.yenniejun.com\/p\/all-languages-are-not-created-tokenized\">&#8220;All languages are NOT created (tokenized) equal&#8221;<\/a>.\u00a0<em>Language models cost much more in some languages than others<\/em>. Archived from\u00a0<a href=\"https:\/\/blog.yenniejun.com\/p\/all-languages-are-not-created-tokenized\">the original<\/a>\u00a0on 2023-08-17. Retrieved\u00a02023-08-17.\u00a0<q>In other words, to express the same sentiment, some languages require up to 10 times more tokens.<\/q><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-26\">^<\/a><\/strong>\u00a0Petrov, Aleksandar; Malfa, Emanuele La; Torr, Philip; Bibi, Adel (June 23, 2023).\u00a0<a href=\"https:\/\/openreview.net\/forum?id=Pj4YYuxTq9\">&#8220;Language Model Tokenizers Introduce Unfairness Between Languages&#8221;<\/a>.\u00a0<em>NeurIPS<\/em>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2305.15425\">2305.15425<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20231215212906\/https:\/\/openreview.net\/forum?id=Pj4YYuxTq9\">Archived<\/a>\u00a0from the original on December 15, 2023. Retrieved\u00a0September 16,\u00a02023\u00a0\u2013 via openreview.net.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-xbiWb_27-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230423211308\/https:\/\/platform.openai.com\/tokenizer\">&#8220;OpenAI API&#8221;<\/a>.\u00a0<em>platform.openai.com<\/em>. Archived from\u00a0<a href=\"https:\/\/platform.openai.com\/\">the original<\/a>\u00a0on April 23, 2023. Retrieved\u00a02023-04-30.<\/li>\n\n\n\n<li>^\u00a0Paa\u00df, Gerhard; Giesselbach, Sven (2022).\u00a0<a href=\"https:\/\/link.springer.com\/chapter\/10.1007\/978-3-031-23190-2_2\">&#8220;Pre-trained Language Models&#8221;<\/a>.\u00a0<em>Foundation Models for Natural Language Processing<\/em>. Artificial Intelligence: Foundations, Theory, and Algorithms. pp.\u00a019\u201378.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1007%2F978-3-031-23190-2_2\">10.1007\/978-3-031-23190-2_2<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISBN_(identifier)\">ISBN<\/a>\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/9783031231902\"><bdi>9783031231902<\/bdi><\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230803212329\/https:\/\/link.springer.com\/chapter\/10.1007\/978-3-031-23190-2_2\">Archived<\/a>\u00a0from the original on 3 August 2023. Retrieved\u00a03 August\u00a02023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-29\">^<\/a><\/strong>\u00a0Petrov, Aleksandar; Emanuele La Malfa; Torr, Philip H. S.; Bibi, Adel (2023). &#8220;Language Model Tokenizers Introduce Unfairness Between Languages&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2305.15425\">2305.15425<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-30\">^<\/a><\/strong>\u00a0Lundberg, Scott (2023-12-12).\u00a0<a href=\"https:\/\/towardsdatascience.com\/the-art-of-prompt-design-prompt-boundaries-and-token-healing-3b2448b0be38\">&#8220;The Art of Prompt Design: Prompt Boundaries and Token Healing&#8221;<\/a>.\u00a0<em>Medium<\/em>. Retrieved\u00a02024-08-05.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-aYNg4_31-0\">^<\/a><\/strong>\u00a0Dodge, Jesse; Sap, Maarten; Marasovi\u0107, Ana; Agnew, William; Ilharco, Gabriel; Groeneveld, Dirk; Mitchell, Margaret; Gardner, Matt (2021). &#8220;Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2104.08758\">2104.08758<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-32\">^<\/a><\/strong>\u00a0Lee, Katherine; Ippolito, Daphne; Nystrom, Andrew; Zhang, Chiyuan; Eck, Douglas; Callison-Burch, Chris; Carlini, Nicholas (May 2022).\u00a0<a href=\"https:\/\/aclanthology.org\/2022.acl-long.577.pdf\">&#8220;Deduplicating Training Data Makes Language Models Better&#8221;<\/a>\u00a0(PDF).\u00a0<em>Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics<\/em>. 1: Long Papers:\u00a08424\u20138445.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.18653%2Fv1%2F2022.acl-long.577\">10.18653\/v1\/2022.acl-long.577<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-33\">^<\/a><\/strong>\u00a0Li, Yuanzhi; Bubeck, S\u00e9bastien; Eldan, Ronen; Del Giorno, Allie; Gunasekar, Suriya; Lee, Yin Tat (2023-09-11),\u00a0<em>Textbooks Are All You Need II: phi-1.5 technical report<\/em>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2309.05463\">2309.05463<\/a><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-34\">^<\/a><\/strong>\u00a0Lin, Zhenghao; Gou, Zhibin; Gong, Yeyun; Liu, Xiao; Shen, Yelong; Xu, Ruochen; Lin, Chen; Yang, Yujiu; Jiao, Jian (2024-04-11). &#8220;Rho-1: Not All Tokens Are What You Need&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2404.07965\">2404.07965<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-qbFw1_35-0\">^<\/a><\/strong>\u00a0Brown, Tom B.; et\u00a0al. (2020). &#8220;Language Models are Few-Shot Learners&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2005.14165\">2005.14165<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-36\">^<\/a><\/strong>\u00a0Abdin, Marah; Jacobs, Sam Ade; Awan, Ammar Ahmad; Aneja, Jyoti; Awadallah, Ahmed; Awadalla, Hany; Bach, Nguyen; Bahree, Amit; Bakhtiari, Arash (2024-04-23). &#8220;Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2404.14219\">2404.14219<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-instructGPT-paper_37-0\">^<\/a><\/strong>\u00a0Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie; Askell, Amanda; Welinder, Peter; Christiano, Paul; Leike, Jan; Lowe, Ryan (2022). &#8220;Training language models to follow instructions with human feedback&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2203.02155\">2203.02155<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-self-instruct-paper_38-0\">^<\/a><\/strong>\u00a0Wang, Yizhong; Kordi, Yeganeh; Mishra, Swaroop; Liu, Alisa; Smith, Noah A.; Khashabi, Daniel; Hajishirzi, Hannaneh (2022). &#8220;Self-Instruct: Aligning Language Model with Self Generated Instructions&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2212.10560\">2212.10560<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-HGZCJ_39-0\">^<\/a><\/strong>\u00a0Shazeer, Noam; Mirhoseini, Azalia; Maziarz, Krzysztof; Davis, Andy; Le, Quoc; Hinton, Geoffrey; Dean, Jeff (2017-01-01). &#8220;Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/1701.06538\">1701.06538<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-R9Qq5_40-0\">^<\/a><\/strong>\u00a0Lepikhin, Dmitry; Lee, HyoukJoong; Xu, Yuanzhong; Chen, Dehao; Firat, Orhan; Huang, Yanping; Krikun, Maxim; Shazeer, Noam; Chen, Zhifeng (2021-01-12). &#8220;GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2006.16668\">2006.16668<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-glam-blog_41-0\">^<\/a><\/strong>\u00a0Dai, Andrew M; Du, Nan (December 9, 2021).\u00a0<a href=\"https:\/\/ai.googleblog.com\/2021\/12\/more-efficient-in-context-learning-with.html\">&#8220;More Efficient In-Context Learning with GLaM&#8221;<\/a>.\u00a0<em>ai.googleblog.com<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230312072042\/https:\/\/ai.googleblog.com\/2021\/12\/more-efficient-in-context-learning-with.html\">Archived<\/a>\u00a0from the original on 2023-03-12. Retrieved\u00a02023-03-09.<\/li>\n\n\n\n<li>^\u00a0Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022).\u00a0<a href=\"https:\/\/openreview.net\/forum?id=yzkSU5zdwD\">&#8220;Emergent Abilities of Large Language Models&#8221;<\/a>.\u00a0<em>Transactions on Machine Learning Research<\/em>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISSN_(identifier)\">ISSN<\/a>\u00a0<a href=\"https:\/\/search.worldcat.org\/issn\/2835-8856\">2835-8856<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230322210052\/https:\/\/openreview.net\/forum?id=yzkSU5zdwD\">Archived<\/a>\u00a0from the original on 22 March 2023. Retrieved\u00a019 March\u00a02023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-Jay_Allamar_43-0\">^<\/a><\/strong>\u00a0Allamar, Jay.\u00a0<a href=\"https:\/\/jalammar.github.io\/illustrated-transformer\/\">&#8220;Illustrated transformer&#8221;<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230725230033\/http:\/\/jalammar.github.io\/illustrated-transformer\/\">Archived<\/a>\u00a0from the original on 2023-07-25. Retrieved\u00a02023-07-29.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-Jay_Allamar_GPT2_44-0\">^<\/a><\/strong>\u00a0Allamar, Jay.\u00a0<a href=\"https:\/\/jalammar.github.io\/illustrated-gpt2\/\">&#8220;The Illustrated GPT-2 (Visualizing Transformer Language Models)&#8221;<\/a>. Retrieved\u00a02023-08-01.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-45\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/blog.google\/technology\/ai\/google-gemini-next-generation-model-february-2024\/#context-window\">&#8220;Our next-generation model: Gemini 1.5&#8221;<\/a>.\u00a0<em>Google<\/em>. 15 February 2024.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240218141522\/https:\/\/blog.google\/technology\/ai\/google-gemini-next-generation-model-february-2024\/#context-window\">Archived<\/a>\u00a0from the original on 18 February 2024. Retrieved\u00a018 February\u00a02024.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-46\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.anthropic.com\/news\/claude-2-1-prompting\">&#8220;Long context prompting for Claude 2.1&#8221;<\/a>. December 6, 2023.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240827053830\/https:\/\/www.anthropic.com\/news\/claude-2-1-prompting\">Archived<\/a>\u00a0from the original on August 27, 2024. Retrieved\u00a0January 20,\u00a02024.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-47\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/platform.openai.com\/docs\/guides\/rate-limits\">&#8220;Rate limits&#8221;<\/a>.\u00a0<em>openai.com<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240202003219\/https:\/\/platform.openai.com\/docs\/guides\/rate-limits\">Archived<\/a>\u00a0from the original on February 2, 2024. Retrieved\u00a0January 20,\u00a02024.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-ioUpE_48-0\">^<\/a><\/strong>\u00a0Zaib, Munazza; Sheng, Quan Z.; Emma Zhang, Wei (4 February 2020).\u00a0<a href=\"https:\/\/www.researchgate.net\/publication\/338931711\">&#8220;A Short Survey of Pre-trained Language Models for Conversational AI-A New Age in NLP&#8221;<\/a>.\u00a0<em>Proceedings of the Australasian Computer Science Week Multiconference<\/em>. pp.\u00a01\u20134.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2104.10810\">2104.10810<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1145%2F3373017.3373028\">10.1145\/3373017.3373028<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISBN_(identifier)\">ISBN<\/a>\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/9781450376976\"><bdi>9781450376976<\/bdi><\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/S2CID_(identifier)\">S2CID<\/a>\u00a0<a href=\"https:\/\/api.semanticscholar.org\/CorpusID:211040895\">211040895<\/a>.<\/li>\n\n\n\n<li>^ Jurafsky, Dan; Martin, James H. (7 January 2023).\u00a0<a href=\"https:\/\/web.stanford.edu\/~jurafsky\/slp3\/ed3book_jan72023.pdf\"><em>Speech and Language Processing<\/em><\/a>\u00a0(PDF)\u00a0(3rd edition draft\u00a0ed.).\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230323210221\/https:\/\/web.stanford.edu\/~jurafsky\/slp3\/ed3book_jan72023.pdf\">Archived<\/a>\u00a0(PDF)\u00a0from the original on 23 March 2023. Retrieved\u00a024 May\u00a02022.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-50\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/imbue.com\/research\/70b-infrastructure\/\">&#8220;From bare metal to a 70B model: infrastructure set-up and scripts&#8221;<\/a>.\u00a0<em>imbue.com<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240726203419\/https:\/\/imbue.com\/research\/70b-infrastructure\/\">Archived<\/a>\u00a0from the original on 2024-07-26. Retrieved\u00a02024-07-24.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-51\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/github.com\/facebookresearch\/metaseq\/tree\/main\/projects\/OPT\/chronicles\">&#8220;metaseq\/projects\/OPT\/chronicles at main \u00b7 facebookresearch\/metaseq&#8221;<\/a>.\u00a0<em>GitHub<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240124035658\/https:\/\/github.com\/facebookresearch\/metaseq\/tree\/main\/projects\/OPT\/chronicles\">Archived<\/a>\u00a0from the original on 2024-01-24. Retrieved\u00a02024-07-24.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-52\">^<\/a><\/strong>\u00a0Albrecht, Josh (2024-07-23).\u00a0<a href=\"https:\/\/www.latent.space\/p\/llm-training-2024\">&#8220;State of the Art: Training >70B LLMs on 10,000 H100 clusters&#8221;<\/a>.\u00a0<em>www.latent.space<\/em>. Retrieved\u00a02024-07-24.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-53\">^<\/a><\/strong>\u00a0Maslej, Nestor; Fattorini, Loredana; Brynjolfsson, Erik; Etchemendy, John; Ligett, Katrina; Lyons, Terah; Manyika, James; Ngo, Helen; Niebles, Juan Carlos (2023-10-05),\u00a0<em>Artificial Intelligence Index Report 2023<\/em>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2310.03715\">2310.03715<\/a><\/li>\n\n\n\n<li>^\u00a0Section 2.1 and Table 1,\u00a0Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B.; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, Dario (2020). &#8220;Scaling Laws for Neural Language Models&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2001.08361\">2001.08361<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-PI1fW_55-0\">^<\/a><\/strong>\u00a0Gao, Luyu; Madaan, Aman; Zhou, Shuyan; Alon, Uri; Liu, Pengfei; Yang, Yiming; Callan, Jamie; Neubig, Graham (2022-11-01). &#8220;PAL: Program-aided Language Models&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2211.10435\">2211.10435<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-J5OW5_56-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/reasonwithpal.com\/\">&#8220;PAL: Program-aided Language Models&#8221;<\/a>.\u00a0<em>reasonwithpal.com<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230612162208\/https:\/\/reasonwithpal.com\/\">Archived<\/a>\u00a0from the original on 2023-06-12. Retrieved\u00a02023-06-12.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-gQxzq_57-0\">^<\/a><\/strong>\u00a0Paranjape, Bhargavi; Lundberg, Scott; Singh, Sameer; Hajishirzi, Hannaneh; Zettlemoyer, Luke; Tulio Ribeiro, Marco (2023-03-01). &#8220;ART: Automatic multi-step reasoning and tool-use for large language models&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2303.09014\">2303.09014<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-lLrda_58-0\">^<\/a><\/strong>\u00a0Liang, Yaobo; Wu, Chenfei; Song, Ting; Wu, Wenshan; Xia, Yan; Liu, Yu; Ou, Yang; Lu, Shuai; Ji, Lei; Mao, Shaoguang; Wang, Yun; Shou, Linjun; Gong, Ming; Duan, Nan (2023-03-01). &#8220;TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2303.16434\">2303.16434<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.AI\">cs.AI<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-4Xzrs_59-0\">^<\/a><\/strong>\u00a0Patil, Shishir G.; Zhang, Tianjun; Wang, Xin; Gonzalez, Joseph E. (2023-05-01). &#8220;Gorilla: Large Language Model Connected with Massive APIs&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2305.15334\">2305.15334<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-BUZBP_60-0\">^<\/a><\/strong>\u00a0Lewis, Patrick; Perez, Ethan; Piktus, Aleksandra; Petroni, Fabio; Karpukhin, Vladimir; Goyal, Naman; K\u00fcttler, Heinrich; Lewis, Mike; Yih, Wen-tau; Rockt\u00e4schel, Tim; Riedel, Sebastian; Kiela, Douwe (2020).\u00a0<a href=\"https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/6b493230205f780e1bc26945df7481e5-Abstract.html\">&#8220;Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks&#8221;<\/a>.\u00a0<em>Advances in Neural Information Processing Systems<\/em>.\u00a0<strong>33<\/strong>. Curran Associates, Inc.:\u00a09459\u20139474.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2005.11401\">2005.11401<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230612171229\/https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/6b493230205f780e1bc26945df7481e5-Abstract.html\">Archived<\/a>\u00a0from the original on 2023-06-12. Retrieved\u00a02023-06-12.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-61\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.kdnuggets.com\/the-growth-behind-llmbased-autonomous-agents\">&#8220;The Growth Behind LLM-based Autonomous Agents&#8221;<\/a>.\u00a0<em>KDnuggets<\/em>. October 23, 2023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-DmvNE_62-0\">^<\/a><\/strong>\u00a0Yao, Shunyu; Zhao, Jeffrey; Yu, Dian; Du, Nan; Shafran, Izhak; Narasimhan, Karthik; Cao, Yuan (2022-10-01). &#8220;ReAct: Synergizing Reasoning and Acting in Language Models&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2210.03629\">2210.03629<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-JS8Vd_63-0\">^<\/a><\/strong>\u00a0Wu, Yue; Prabhumoye, Shrimai; Min, So Yeon (24 May 2023). &#8220;SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2305.15486\">2305.15486<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.AI\">cs.AI<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-64\">^<\/a><\/strong>\u00a0Wang, Zihao; Cai, Shaofei; Liu, Anji; Ma, Xiaojian; Liang, Yitao (2023-02-03). &#8220;Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2302.01560\">2302.01560<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.AI\">cs.AI<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-sbB2T_65-0\">^<\/a><\/strong>\u00a0Shinn, Noah; Cassano, Federico; Labash, Beck; Gopinath, Ashwin; Narasimhan, Karthik; Yao, Shunyu (2023-03-01). &#8220;Reflexion: Language Agents with Verbal Reinforcement Learning&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2303.11366\">2303.11366<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.AI\">cs.AI<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-ltTer_66-0\">^<\/a><\/strong>\u00a0Hao, Shibo; Gu, Yi; Ma, Haodi; Jiahua Hong, Joshua; Wang, Zhen; Zhe Wang, Daisy; Hu, Zhiting (2023-05-01). &#8220;Reasoning with Language Model is Planning with World Model&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2305.14992\">2305.14992<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-mBvD9_67-0\">^<\/a><\/strong>\u00a0Zhang, Jenny; Lehman, Joel; Stanley, Kenneth; Clune, Jeff (2 June 2023). &#8220;OMNI: Open-endedness via Models of human Notions of Interestingness&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2306.01711\">2306.01711<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.AI\">cs.AI<\/a>].<\/li>\n\n\n\n<li>^\u00a0<a href=\"https:\/\/voyager.minedojo.org\/\">&#8220;Voyager | An Open-Ended Embodied Agent with Large Language Models&#8221;<\/a>.\u00a0<em>voyager.minedojo.org<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230608225054\/https:\/\/voyager.minedojo.org\/\">Archived<\/a>\u00a0from the original on 2023-06-08. Retrieved\u00a02023-06-09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-XuvjF_69-0\">^<\/a><\/strong>\u00a0Park, Joon Sung; O&#8217;Brien, Joseph C.; Cai, Carrie J.; Ringel Morris, Meredith; Liang, Percy; Bernstein, Michael S. (2023-04-01). &#8220;Generative Agents: Interactive Simulacra of Human Behavior&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2304.03442\">2304.03442<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.HC\">cs.HC<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-70\">^<\/a><\/strong>\u00a0Mann, Tobias.\u00a0<a href=\"https:\/\/www.theregister.com\/2024\/03\/17\/ai_pc_local_llm\/\">&#8220;How to run an LLM locally on your PC in less than 10 minutes&#8221;<\/a>.\u00a0<em>www.theregister.com<\/em>. Retrieved\u00a02024-05-17.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-LS2Go_71-0\">^<\/a><\/strong>\u00a0Nagel, Markus; Amjad, Rana Ali; Baalen, Mart Van; Louizos, Christos; Blankevoort, Tijmen (2020-11-21).\u00a0<a href=\"https:\/\/proceedings.mlr.press\/v119\/nagel20a.html\">&#8220;Up or Down? Adaptive Rounding for Post-Training Quantization&#8221;<\/a>.\u00a0<em>Proceedings of the 37th International Conference on Machine Learning<\/em>. PMLR:\u00a07197\u20137206.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230614080854\/https:\/\/proceedings.mlr.press\/v119\/nagel20a.html\">Archived<\/a>\u00a0from the original on 2023-06-14. Retrieved\u00a02023-06-14.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-cpzcK_72-0\">^<\/a><\/strong>\u00a0Polino, Antonio; Pascanu, Razvan; Alistarh, Dan (2018-02-01). &#8220;Model compression via distillation and quantization&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/1802.05668\">1802.05668<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.NE\">cs.NE<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-QVU95_73-0\">^<\/a><\/strong>\u00a0Frantar, Elias; Ashkboos, Saleh; Hoefler, Torsten; Alistarh, Dan (2022-10-01). &#8220;GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2210.17323\">2210.17323<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-dU9Bu_74-0\">^<\/a><\/strong>\u00a0Dettmers, Tim; Svirschevski, Ruslan; Egiazarian, Vage; Kuznedelev, Denis; Frantar, Elias; Ashkboos, Saleh; Borzunov, Alexander; Hoefler, Torsten; Alistarh, Dan (2023-06-01). &#8220;SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2306.03078\">2306.03078<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-75\">^<\/a><\/strong>\u00a0Grootendorst, Maarten.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240731003355\/https:\/\/newsletter.maartengrootendorst.com\/p\/a-visual-guide-to-quantization\">&#8220;A Visual Guide to Quantization&#8221;<\/a>.\u00a0<em>newsletter.maartengrootendorst.com<\/em>. Archived from\u00a0<a href=\"https:\/\/newsletter.maartengrootendorst.com\/p\/a-visual-guide-to-quantization\">the original<\/a>\u00a0on 31 Jul 2024. Retrieved\u00a02024-07-31.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-D0nFA_76-0\">^<\/a><\/strong>\u00a0Dettmers, Tim; Pagnoni, Artidoro;\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Ari_Holtzman\">Holtzman, Ari<\/a>; Zettlemoyer, Luke (2023-05-01). &#8220;QLoRA: Efficient Finetuning of Quantized LLMs&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2305.14314\">2305.14314<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-77\">^<\/a><\/strong>\u00a0Kiros, Ryan; Salakhutdinov, Ruslan; Zemel, Rich (2014-06-18).\u00a0<a href=\"https:\/\/proceedings.mlr.press\/v32\/kiros14.html\">&#8220;Multimodal Neural Language Models&#8221;<\/a>.\u00a0<em>Proceedings of the 31st International Conference on Machine Learning<\/em>. PMLR:\u00a0595\u2013603.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230702195952\/https:\/\/proceedings.mlr.press\/v32\/kiros14.html\">Archived<\/a>\u00a0from the original on 2023-07-02. Retrieved\u00a02023-07-02.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-78\">^<\/a><\/strong>\u00a0Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E (2012).\u00a0<a href=\"https:\/\/proceedings.neurips.cc\/paper\/2012\/hash\/c399862d3b9d6b76c8436e924a68c45b-Abstract.html\">&#8220;ImageNet Classification with Deep Convolutional Neural Networks&#8221;<\/a>.\u00a0<em>Advances in Neural Information Processing Systems<\/em>.\u00a0<strong>25<\/strong>. Curran Associates, Inc.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230702195952\/https:\/\/proceedings.neurips.cc\/paper\/2012\/hash\/c399862d3b9d6b76c8436e924a68c45b-Abstract.html\">Archived<\/a>\u00a0from the original on 2023-07-02. Retrieved\u00a02023-07-02.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-79\">^<\/a><\/strong>\u00a0Antol, Stanislaw; Agrawal, Aishwarya; Lu, Jiasen; Mitchell, Margaret; Batra, Dhruv; Zitnick, C. Lawrence; Parikh, Devi (2015).\u00a0<a href=\"https:\/\/openaccess.thecvf.com\/content_iccv_2015\/html\/Antol_VQA_Visual_Question_ICCV_2015_paper.html\">&#8220;VQA: Visual Question Answering&#8221;<\/a>.\u00a0<em>ICCV<\/em>:\u00a02425\u20132433.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230702195952\/https:\/\/openaccess.thecvf.com\/content_iccv_2015\/html\/Antol_VQA_Visual_Question_ICCV_2015_paper.html\">Archived<\/a>\u00a0from the original on 2023-07-02. Retrieved\u00a02023-07-02.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-80\">^<\/a><\/strong>\u00a0Li, Junnan; Li, Dongxu; Savarese, Silvio; Hoi, Steven (2023-01-01). &#8220;BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2301.12597\">2301.12597<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CV\">cs.CV<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-81\">^<\/a><\/strong>\u00a0Alayrac, Jean-Baptiste; Donahue, Jeff; Luc, Pauline; Miech, Antoine; Barr, Iain; Hasson, Yana; Lenc, Karel; Mensch, Arthur; Millican, Katherine; Reynolds, Malcolm; Ring, Roman; Rutherford, Eliza; Cabi, Serkan; Han, Tengda; Gong, Zhitao (2022-12-06).\u00a0<a href=\"https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/hash\/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html\">&#8220;Flamingo: a Visual Language Model for Few-Shot Learning&#8221;<\/a>.\u00a0<em>Advances in Neural Information Processing Systems<\/em>.\u00a0<strong>35<\/strong>:\u00a023716\u201323736.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2204.14198\">2204.14198<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230702195951\/https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/hash\/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html\">Archived<\/a>\u00a0from the original on 2023-07-02. Retrieved\u00a02023-07-02.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-82\">^<\/a><\/strong>\u00a0Driess, Danny; Xia, Fei; Sajjadi, Mehdi S. M.; Lynch, Corey; Chowdhery, Aakanksha; Ichter, Brian; Wahid, Ayzaan; Tompson, Jonathan; Vuong, Quan; Yu, Tianhe; Huang, Wenlong; Chebotar, Yevgen; Sermanet, Pierre; Duckworth, Daniel; Levine, Sergey (2023-03-01). &#8220;PaLM-E: An Embodied Multimodal Language Model&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2303.03378\">2303.03378<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-83\">^<\/a><\/strong>\u00a0Liu, Haotian; Li, Chunyuan; Wu, Qingyang; Lee, Yong Jae (2023-04-01). &#8220;Visual Instruction Tuning&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2304.08485\">2304.08485<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CV\">cs.CV<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-84\">^<\/a><\/strong>\u00a0Zhang, Hang; Li, Xin; Bing, Lidong (2023-06-01). &#8220;Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2306.02858\">2306.02858<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-85\">^<\/a><\/strong>\u00a0OpenAI (2023-03-27). &#8220;GPT-4 Technical Report&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2303.08774\">2303.08774<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-86\">^<\/a><\/strong>\u00a0OpenAI (September 25, 2023).\u00a0<a href=\"https:\/\/cdn.openai.com\/papers\/GPTV_System_Card.pdf\">&#8220;GPT-4V(ision) System Card&#8221;<\/a>\u00a0(PDF).<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-87\">^<\/a><\/strong>\u00a0Pichai, Sundar (10 May 2023),\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=cNfINi5CNbY&amp;t=931s\"><em>Google Keynote (Google I\/O &#8217;23)<\/em><\/a>, timestamp 15:31, retrieved\u00a02023-07-02<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-88\">^<\/a><\/strong>\u00a0Wiggers, Kyle (11 September 2024).\u00a0<a href=\"https:\/\/techcrunch.com\/2024\/09\/11\/mistral-releases-pixtral-its-first-multimodal-model\/?utm_medium=aisecret.us&amp;utm_source=aisecret.us&amp;utm_campaign=aisecret.us\">&#8220;Mistral releases Pixtral 12B, its first multimodal model&#8221;<\/a>.\u00a0<em>TechCrunch<\/em>. Retrieved\u00a014 September\u00a02024.<\/li>\n\n\n\n<li>^\u00a0<a href=\"https:\/\/openai.com\/index\/introducing-openai-o1-preview\/\">&#8220;Introducing OpenAI o1-preview&#8221;<\/a>.\u00a0<em>OpenAI<\/em>. 2024-09-12. Retrieved\u00a02025-02-03.<\/li>\n\n\n\n<li>^\u00a0Metz, Cade (2024-12-20).\u00a0<a href=\"https:\/\/www.nytimes.com\/2024\/12\/20\/technology\/openai-new-ai-math-science.html\">&#8220;OpenAI Unveils New A.I. That Can &#8216;Reason&#8217; Through Math and Science Problems&#8221;<\/a>.\u00a0<em>The New York Times<\/em>. Retrieved\u00a02025-02-03.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-nature-deepseek_91-0\">^<\/a><\/strong>\u00a0Gibney, Elizabeth (2025-01-30).\u00a0<a href=\"https:\/\/www.nature.com\/articles\/d41586-025-00229-6\">&#8220;China&#8217;s cheap, open AI model DeepSeek thrills scientists&#8221;<\/a>.\u00a0<em>Nature<\/em>. Retrieved\u00a02025-02-03.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-Lin-2025-02-05-WSJ_92-0\">^<\/a><\/strong>\u00a0Lin, Belle (2025-02-05).\u00a0<a href=\"https:\/\/www.wsj.com\/articles\/why-amazon-is-betting-on-automated-reasoning-to-reduce-ais-hallucinations-b838849e\">&#8220;Why Amazon is Betting on &#8216;Automated Reasoning&#8217; to Reduce AI&#8217;s Hallucinations: The tech giant says an obscure field that combines AI and math can mitigate\u2014but not completely eliminate\u2014AI&#8217;s propensity to provide wrong answers&#8221;<\/a>.\u00a0<em>Wall Street Journal<\/em>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISSN_(identifier)\">ISSN<\/a>\u00a0<a href=\"https:\/\/search.worldcat.org\/issn\/0099-9660\">0099-9660<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-fJta3_93-0\">^<\/a><\/strong>\u00a0Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan (2022-03-29). &#8220;Training Compute-Optimal Large Language Models&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2203.15556\">2203.15556<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li>^\u00a0Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). &#8220;Broken Neural Scaling Laws&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2210.14891\">2210.14891<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-JM6s1_95-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.jasonwei.net\/blog\/emergence\">&#8220;137 emergent abilities of large language models&#8221;<\/a>.\u00a0<em>Jason Wei<\/em>. Retrieved\u00a02023-06-24.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-Bowman_96-0\">^<\/a><\/strong>\u00a0Bowman, Samuel R. (2023). &#8220;Eight Things to Know about Large Language Models&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2304.00612\">2304.00612<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-Heuristic-Mukherjee_97-0\">^<\/a><\/strong>\u00a0Mukherjee, Anirban; Chang, Hannah (2024). &#8220;Heuristic Reasoning in AI: Instrumental Use and Mimetic Absorption&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2403.09404\">2403.09404<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.AI\">cs.AI<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-Hahn_20230314_98-0\">^<\/a><\/strong>\u00a0Hahn, Michael; Goyal, Navin (2023-03-14). &#8220;A Theory of Emergent In-Context Learning as Implicit Structure Induction&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2303.07971\">2303.07971<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-57FEA_99-0\">^<\/a><\/strong>\u00a0Pilehvar, Mohammad Taher; Camacho-Collados, Jose (June 2019).\u00a0<a href=\"https:\/\/aclanthology.org\/N19-1128\">&#8220;Proceedings of the 2019 Conference of the North&#8221;<\/a>.\u00a0<em>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)<\/em>. Minneapolis, Minnesota: Association for Computational Linguistics:\u00a01267\u20131273.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.18653%2Fv1%2FN19-1128\">10.18653\/v1\/N19-1128<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/S2CID_(identifier)\">S2CID<\/a>\u00a0<a href=\"https:\/\/api.semanticscholar.org\/CorpusID:102353817\">102353817<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230627202732\/https:\/\/aclanthology.org\/N19-1128\/\">Archived<\/a>\u00a0from the original on 2023-06-27. Retrieved\u00a02023-06-27.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-TEIkA_100-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/pilehvar.github.io\/wic\/\">&#8220;WiC: The Word-in-Context Dataset&#8221;<\/a>.\u00a0<em>pilehvar.github.io<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230627202725\/https:\/\/pilehvar.github.io\/wic\/\">Archived<\/a>\u00a0from the original on 2023-06-27. Retrieved\u00a02023-06-27.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-zgy1i_101-0\">^<\/a><\/strong>\u00a0Patel, Roma; Pavlick, Ellie (2021-10-06).\u00a0<a href=\"https:\/\/openreview.net\/forum?id=gJcEM8sxHK\">&#8220;Mapping Language Models to Grounded Conceptual Spaces&#8221;<\/a>.\u00a0<em>ICLR<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230624191940\/https:\/\/openreview.net\/forum?id=gJcEM8sxHK\">Archived<\/a>\u00a0from the original on 2023-06-24. Retrieved\u00a02023-06-27.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-Imb98_102-0\">^<\/a><\/strong>\u00a0<em><a href=\"https:\/\/www.notion.so\/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f\">A Closer Look at Large Language Models Emergent Abilities<\/a>\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230624012329\/https:\/\/www.notion.so\/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f\">Archived<\/a>\u00a02023-06-24 at the\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Wayback_Machine\">Wayback Machine<\/a><\/em>\u00a0(Yao Fu, Nov 20, 2022)<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-CeQVF_103-0\">^<\/a><\/strong>\u00a0Ornes, Stephen (March 16, 2023).\u00a0<a href=\"https:\/\/www.quantamagazine.org\/the-unpredictable-abilities-emerging-from-large-ai-models-20230316\/\">&#8220;The Unpredictable Abilities Emerging From Large AI Models&#8221;<\/a>.\u00a0<em>Quanta Magazine<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230316203438\/https:\/\/www.quantamagazine.org\/the-unpredictable-abilities-emerging-from-large-ai-models-20230316\/\">Archived<\/a>\u00a0from the original on March 16, 2023. Retrieved\u00a0March 16,\u00a02023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-C775b_104-0\">^<\/a><\/strong>\u00a0Schaeffer, Rylan; Miranda, Brando; Koyejo, Sanmi (2023-04-01). &#8220;Are Emergent Abilities of Large Language Models a Mirage?&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2304.15004\">2304.15004<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.AI\">cs.AI<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-105\">^<\/a><\/strong>\u00a0Blank, Idan A. (November 2023).\u00a0<a href=\"https:\/\/doi.org\/10.1016%2Fj.tics.2023.08.006\">&#8220;What are large language models supposed to model?&#8221;<\/a>.\u00a0<em>Trends in Cognitive Sciences<\/em>.\u00a0<strong>27<\/strong>\u00a0(11):\u00a0987\u2013989.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1016%2Fj.tics.2023.08.006\">10.1016\/j.tics.2023.08.006<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/PMID_(identifier)\">PMID<\/a>\u00a0<a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/37659920\">37659920<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-IZSIr_106-0\">^<\/a><\/strong>\u00a0Li, Kenneth; Hopkins, Aspen K.; Bau, David; Vi\u00e9gas, Fernanda; Pfister, Hanspeter; Wattenberg, Martin (2022-10-01). &#8220;Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2210.13382\">2210.13382<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-RLik9_107-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/thegradient.pub\/othello\/\">&#8220;Large Language Model: world models or surface statistics?&#8221;<\/a>.\u00a0<em>The Gradient<\/em>. 2023-01-21. Retrieved\u00a02023-06-12.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-Hln1l_108-0\">^<\/a><\/strong>\u00a0Jin, Charles; Rinard, Martin (2023-05-01). &#8220;Evidence of Meaning in Language Models Trained on Programs&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2305.11169\">2305.11169<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-oYGlo_109-0\">^<\/a><\/strong>\u00a0Nanda, Neel; Chan, Lawrence; Lieberum, Tom; Smith, Jess; Steinhardt, Jacob (2023-01-01). &#8220;Progress measures for grokking via mechanistic interpretability&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2301.05217\">2301.05217<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li>^\u00a0Mitchell, Melanie; Krakauer, David C. (28 March 2023).\u00a0<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC10068812\">&#8220;The debate over understanding in AI&#8217;s large language models&#8221;<\/a>.\u00a0<em>Proceedings of the National Academy of Sciences<\/em>.\u00a0<strong>120<\/strong>\u00a0(13): e2215907120.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2210.13966\">2210.13966<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Bibcode_(identifier)\">Bibcode<\/a>:<a href=\"https:\/\/ui.adsabs.harvard.edu\/abs\/2023PNAS..12015907M\">2023PNAS..12015907M<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1073%2Fpnas.2215907120\">10.1073\/pnas.2215907120<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/PMC_(identifier)\">PMC<\/a>\u00a0<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC10068812\">10068812<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/PMID_(identifier)\">PMID<\/a>\u00a0<a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/36943882\">36943882<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-O8Upd_111-0\">^<\/a><\/strong>\u00a0Metz, Cade (16 May 2023).\u00a0<a href=\"https:\/\/www.nytimes.com\/2023\/05\/16\/technology\/microsoft-ai-human-reasoning.html\">&#8220;Microsoft Says New A.I. Shows Signs of Human Reasoning&#8221;<\/a>.\u00a0<em>The New York Times<\/em>.<\/li>\n\n\n\n<li>^\u00a0Bubeck, S\u00e9bastien; Chandrasekaran, Varun; Eldan, Ronen; Gehrke, Johannes; Horvitz, Eric; Kamar, Ece; Lee, Peter; Lee, Yin Tat; Li, Yuanzhi; Lundberg, Scott; Nori, Harsha; Palangi, Hamid; Ribeiro, Marco Tulio; Zhang, Yi (2023). &#8220;Sparks of Artificial General Intelligence: Early experiments with GPT-4&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2303.12712\">2303.12712<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-113\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.fastcompany.com\/91211163\/anthropic-ceo-dario-amodei-pens-a-smart-look-at-our-ai-future\">&#8220;Anthropic CEO Dario Amodei pens a smart look at our AI future&#8221;<\/a>.\u00a0<em>Fast Company<\/em>. October 17, 2024.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-rEEmH_114-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.zdnet.com\/article\/chatgpt-is-more-like-an-alien-intelligence-than-a-human-brain-says-futurist\/\">&#8220;ChatGPT is more like an &#8216;alien intelligence&#8217; than a human brain, says futurist&#8221;<\/a>.\u00a0<em>ZDNET<\/em>. 2023.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230612065937\/https:\/\/www.zdnet.com\/article\/chatgpt-is-more-like-an-alien-intelligence-than-a-human-brain-says-futurist\/\">Archived<\/a>\u00a0from the original on 12 June 2023. Retrieved\u00a012 June\u00a02023.<\/li>\n\n\n\n<li>^\u00a0Newport, Cal (13 April 2023).\u00a0<a href=\"https:\/\/www.newyorker.com\/science\/annals-of-artificial-intelligence\/what-kind-of-mind-does-chatgpt-have\">&#8220;What Kind of Mind Does ChatGPT Have?&#8221;<\/a>.\u00a0<em>The New Yorker<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230612071443\/https:\/\/www.newyorker.com\/science\/annals-of-artificial-intelligence\/what-kind-of-mind-does-chatgpt-have\">Archived<\/a>\u00a0from the original on 12 June 2023. Retrieved\u00a012 June\u00a02023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-rAFIZ_116-0\">^<\/a><\/strong>\u00a0Roose, Kevin (30 May 2023).\u00a0<a href=\"https:\/\/www.nytimes.com\/2023\/05\/30\/technology\/shoggoth-meme-ai.html\">&#8220;Why an Octopus-like Creature Has Come to Symbolize the State of A.I.&#8221;<\/a>\u00a0<em>The New York Times<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230530193814\/https:\/\/www.nytimes.com\/2023\/05\/30\/technology\/shoggoth-meme-ai.html\">Archived<\/a>\u00a0from the original on 30 May 2023. Retrieved\u00a012 June\u00a02023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-4luKE_117-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/time.com\/6271657\/a-to-z-of-artificial-intelligence\/\">&#8220;The A to Z of Artificial Intelligence&#8221;<\/a>.\u00a0<em>Time Magazine<\/em>. 13 April 2023.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230616123839\/https:\/\/time.com\/6271657\/a-to-z-of-artificial-intelligence\/\">Archived<\/a>\u00a0from the original on 16 June 2023. Retrieved\u00a012 June\u00a02023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-hallucination-survey_118-0\">^<\/a><\/strong>\u00a0Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Dai, Wenliang; Madotto, Andrea; Fung, Pascale (November 2022).\u00a0<a href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3571730\">&#8220;Survey of Hallucination in Natural Language Generation&#8221;<\/a>\u00a0(pdf).\u00a0<em>ACM Computing Surveys<\/em>.\u00a0<strong>55<\/strong>\u00a0(12).\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Association_for_Computing_Machinery\">Association for Computing Machinery<\/a>:\u00a01\u201338.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2202.03629\">2202.03629<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1145%2F3571730\">10.1145\/3571730<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/S2CID_(identifier)\">S2CID<\/a>\u00a0<a href=\"https:\/\/api.semanticscholar.org\/CorpusID:246652372\">246652372<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230326145635\/https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3571730\">Archived<\/a>\u00a0from the original on 26 March 2023. Retrieved\u00a015 January\u00a02023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-119\">^<\/a><\/strong>\u00a0Varshney, Neeraj; Yao, Wenlin; Zhang, Hongming; Chen, Jianshu; Yu, Dong (2023). &#8220;A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2307.03987\">2307.03987<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-120\">^<\/a><\/strong>\u00a0Lakoff, George (1999).\u00a0<em>Philosophy in the Flesh: The Embodied Mind and Its Challenge to Western Philosophy; Appendix: The Neural Theory of Language Paradigm<\/em>. New York Basic Books. pp.\u00a0569\u2013583.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISBN_(identifier)\">ISBN<\/a>\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-465-05674-3\"><bdi>978-0-465-05674-3<\/bdi><\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-121\">^<\/a><\/strong>\u00a0Evans, Vyvyan. (2014).\u00a0<em>The Language Myth<\/em>. Cambridge University Press.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISBN_(identifier)\">ISBN<\/a>\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1-107-04396-1\"><bdi>978-1-107-04396-1<\/bdi><\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-122\">^<\/a><\/strong>\u00a0Friston, Karl J. (2022).\u00a0<em>Active Inference: The Free Energy Principle in Mind, Brain, and Behavior; Chapter 4 The Generative Models of Active Inference<\/em>. The MIT Press.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISBN_(identifier)\">ISBN<\/a>\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-262-36997-8\"><bdi>978-0-262-36997-8<\/bdi><\/a>.<\/li>\n\n\n\n<li>^\u00a0Huyen, Chip (October 18, 2019).\u00a0<a href=\"https:\/\/thegradient.pub\/understanding-evaluation-metrics-for-language-models\/\">&#8220;Evaluation Metrics for Language Modeling&#8221;<\/a>.\u00a0<em>The Gradient<\/em>. Retrieved\u00a0January 14,\u00a02024.<\/li>\n\n\n\n<li>^\u00a0Clark, Christopher; Lee, Kenton; Chang, Ming-Wei; Kwiatkowski, Tom; Collins, Michael; Toutanova, Kristina (2019). &#8220;BoolQ: Exploring the Surprising Difficulty of Natural Yes\/No Questions&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/1905.10044\">1905.10044<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li>^\u00a0Wayne Xin Zhao; Zhou, Kun; Li, Junyi; Tang, Tianyi; Wang, Xiaolei; Hou, Yupeng; Min, Yingqian; Zhang, Beichen; Zhang, Junjie; Dong, Zican; Du, Yifan; Yang, Chen; Chen, Yushuo; Chen, Zhipeng; Jiang, Jinhao; Ren, Ruiyang; Li, Yifan; Tang, Xinyu; Liu, Zikang; Liu, Peiyu; Nie, Jian-Yun; Wen, Ji-Rong (2023). &#8220;A Survey of Large Language Models&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2303.18223\">2303.18223<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-126\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/github.com\/openai\/simple-evals\"><em>openai\/simple-evals<\/em><\/a>, OpenAI, 2024-05-28, retrieved\u00a02024-05-28<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-127\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/github.com\/openai\/evals\"><em>openai\/evals<\/em><\/a>, OpenAI, 2024-05-28,\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240508225708\/https:\/\/github.com\/openai\/evals\">archived<\/a>\u00a0from the original on 2024-05-08, retrieved\u00a02024-05-28<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-128\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/imbue.com\/research\/70b-evals\/\">&#8220;Sanitized open-source datasets for natural language and code understanding: how we evaluated our 70B model&#8221;<\/a>.\u00a0<em>imbue.com<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240726173012\/https:\/\/imbue.com\/research\/70b-evals\/\">Archived<\/a>\u00a0from the original on 2024-07-26. Retrieved\u00a02024-07-24.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-129\">^<\/a><\/strong>\u00a0Nangia, Nikita and Vania, Clara and Bhalerao, Rasika and Bowman, Samuel R. (November 2020).\u00a0<a href=\"https:\/\/aclanthology.org\/2020.emnlp-main.154\/\">&#8220;CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models&#8221;<\/a>. In Webber, Bonnie and Cohn, Trevor and He, Yulan and Liu, Yang (ed.).\u00a0<em>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)<\/em>. Association for Computational Linguistics. pp.\u00a01953\u20131967.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2010.00133\">2010.00133<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.18653%2Fv1%2F2020.emnlp-main.154\">10.18653\/v1\/2020.emnlp-main.154<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-130\">^<\/a><\/strong>\u00a0Nadeem, Moin and Bethke, Anna and Reddy, Siva (August 2021).\u00a0<a href=\"https:\/\/aclanthology.org\/2021.acl-long.416\/\">&#8220;StereoSet: Measuring stereotypical bias in pretrained language models&#8221;<\/a>. In Zong, Chengqing and Xia, Fei and Li, Wenjie and Navigli, Roberto (ed.).\u00a0<em>Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)<\/em>. Association for Computational Linguistics. pp.\u00a05356\u20135371.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2004.09456\">2004.09456<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.18653%2Fv1%2F2021.acl-long.416\">10.18653\/v1\/2021.acl-long.416<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-131\">^<\/a><\/strong>\u00a0Simpson, Shmona and Nukpezah, Jonathan and Kie Brooks and Pandya, Raaghav (17 December 2024).\u00a0<a href=\"https:\/\/doi.org\/10.1007%2Fs43681-024-00613-4\">&#8220;Parity benchmark for measuring bias in LLMs&#8221;<\/a>.\u00a0<em>AI and Ethics<\/em>. Springer.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1007%2Fs43681-024-00613-4\">10.1007\/s43681-024-00613-4<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-bigbench_132-0\">^<\/a><\/strong>\u00a0Srivastava, Aarohi; et\u00a0al. (2022). &#8220;Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2206.04615\">2206.04615<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-truthfulqa_133-0\">^<\/a><\/strong>\u00a0Lin, Stephanie; Hilton, Jacob; Evans, Owain (2021). &#8220;TruthfulQA: Measuring How Models Mimic Human Falsehoods&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2109.07958\">2109.07958<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li>^\u00a0Zellers, Rowan; Holtzman, Ari; Bisk, Yonatan; Farhadi, Ali; Choi, Yejin (2019). &#8220;HellaSwag: Can a Machine Really Finish Your Sentence?&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/1905.07830\">1905.07830<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-ZDTUM_135-0\">^<\/a><\/strong>\u00a0&#8220;Prepare for truly useful large language models&#8221;.\u00a0<em>Nature Biomedical Engineering<\/em>.\u00a0<strong>7<\/strong>\u00a0(2):\u00a085\u201386. 7 March 2023.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1038%2Fs41551-023-01012-6\">10.1038\/s41551-023-01012-6<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/PMID_(identifier)\">PMID<\/a>\u00a0<a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/36882584\">36882584<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/S2CID_(identifier)\">S2CID<\/a>\u00a0<a href=\"https:\/\/api.semanticscholar.org\/CorpusID:257403466\">257403466<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-81w7x_136-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.economist.com\/finance-and-economics\/2023\/05\/07\/your-job-is-probably-safe-from-artificial-intelligence\">&#8220;Your job is (probably) safe from artificial intelligence&#8221;<\/a>.\u00a0<em>The Economist<\/em>. 7 May 2023.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230617225618\/https:\/\/www.economist.com\/finance-and-economics\/2023\/05\/07\/your-job-is-probably-safe-from-artificial-intelligence\">Archived<\/a>\u00a0from the original on 17 June 2023. Retrieved\u00a018 June\u00a02023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-zIM6Y_137-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.goldmansachs.com\/intelligence\/pages\/generative-ai-could-raise-global-gdp-by-7-percent.html\">&#8220;Generative AI Could Raise Global GDP by 7%&#8221;<\/a>.\u00a0<em>Goldman Sachs<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230618013836\/https:\/\/www.goldmansachs.com\/intelligence\/pages\/generative-ai-could-raise-global-gdp-by-7-percent.html\">Archived<\/a>\u00a0from the original on 18 June 2023. Retrieved\u00a018 June\u00a02023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-138\">^<\/a><\/strong>\u00a0Peng, Zhencan; Wang, Zhizhi; Deng, Dong (13 June 2023).\u00a0<a href=\"https:\/\/people.cs.rutgers.edu\/~dd903\/assets\/papers\/sigmod23.pdf\">&#8220;Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation&#8221;<\/a>\u00a0(PDF).\u00a0<em>Proceedings of the ACM on Management of Data<\/em>.\u00a0<strong>1<\/strong>\u00a0(2):\u00a01\u201318.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1145%2F3589324\">10.1145\/3589324<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/S2CID_(identifier)\">S2CID<\/a>\u00a0<a href=\"https:\/\/api.semanticscholar.org\/CorpusID:259213212\">259213212<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240827053753\/https:\/\/people.cs.rutgers.edu\/~dd903\/assets\/papers\/sigmod23.pdf\">Archived<\/a>\u00a0(PDF)\u00a0from the original on 2024-08-27. Retrieved\u00a02024-01-20.\u00a0Citing Lee et al 2022.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-139\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#CITEREFPengWangDeng2023\">Peng, Wang &amp; Deng 2023<\/a>, p.\u00a08.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-140\">^<\/a><\/strong>\u00a0Stephen Council (1 Dec 2023).\u00a0<a href=\"https:\/\/www.sfgate.com\/tech\/article\/google-openai-chatgpt-break-model-18525445.php\">&#8220;How Googlers cracked an SF rival&#8217;s tech model with a single word&#8221;<\/a>. SFGATE.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20231216160941\/https:\/\/www.sfgate.com\/tech\/article\/google-openai-chatgpt-break-model-18525445.php\">Archived<\/a>\u00a0from the original on 16 December 2023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-nD6kH_141-0\">^<\/a><\/strong>\u00a0Alba, Davey (1 May 2023).\u00a0<a href=\"https:\/\/www.japantimes.co.jp\/news\/2023\/05\/01\/business\/tech\/ai-fake-news-content-farms\/\">&#8220;AI chatbots have been used to create dozens of news content farms&#8221;<\/a>.\u00a0<em>The Japan Times<\/em>. Retrieved\u00a018 June\u00a02023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-PKiPY_142-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.science.org\/content\/article\/could-chatbots-help-devise-next-pandemic-virus\">&#8220;Could chatbots help devise the next pandemic virus?&#8221;<\/a>.\u00a0<em>Science<\/em>. 14 June 2023.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1126%2Fscience.adj2463\">10.1126\/science.adj2463<\/a>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230618013834\/https:\/\/www.science.org\/content\/article\/could-chatbots-help-devise-next-pandemic-virus\">Archived<\/a>\u00a0from the original on 18 June 2023. Retrieved\u00a018 June\u00a02023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-143\">^<\/a><\/strong>\u00a0Hubinger, Evan (10 January 2024). &#8220;Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2401.05566\">2401.05566<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CR\">cs.CR<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-144\">^<\/a><\/strong>\u00a0Kang, Daniel (2023). &#8220;Exploiting programmatic behavior of LLMs: Dual-use through standard security attacks&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2302.05733\">2302.05733<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CR\">cs.CR<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-145\">^<\/a><\/strong>\u00a0Wang, Yongge (20 June 2024).\u00a0<a href=\"https:\/\/eprint.iacr.org\/2024\/586.pdf\">&#8220;Encryption Based Covert Channel for Large Language Models&#8221;<\/a>\u00a0(PDF). IACR ePrint 2024\/586.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20240624191233\/https:\/\/eprint.iacr.org\/2024\/586.pdf\">Archived<\/a>\u00a0(PDF)\u00a0from the original on 24 June 2024. Retrieved\u00a024 June\u00a02024.<\/li>\n\n\n\n<li>^\u00a0Stokel-Walker, Chris (November 22, 2023).\u00a0<a href=\"https:\/\/www.scientificamerican.com\/article\/chatgpt-replicates-gender-bias-in-recommendation-letters\/\">&#8220;ChatGPT Replicates Gender Bias in Recommendation Letters&#8221;<\/a>.\u00a0<em>Scientific American<\/em>.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20231229043124\/https:\/\/www.scientificamerican.com\/article\/chatgpt-replicates-gender-bias-in-recommendation-letters\/\">Archived<\/a>\u00a0from the original on 2023-12-29. Retrieved\u00a02023-12-29.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-:1_147-0\">^<\/a><\/strong>\u00a0Luo, Queenie; Puett, Michael J.; Smith, Michael D. (2023-03-28). &#8220;A Perspectival Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, Wikipedia, and YouTube&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2303.16281v2\">2303.16281v2<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CY\">cs.CY<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-148\">^<\/a><\/strong>\u00a0Cheng, Myra; Durmus, Esin; Jurafsky, Dan (2023-05-29),\u00a0<em>Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models<\/em>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2305.18189\">2305.18189<\/a><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-149\">^<\/a><\/strong>\u00a0Kotek, Hadas; Dockum, Rikker; Sun, David (2023-11-05).\u00a0<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3582269.3615599\">&#8220;Gender bias and stereotypes in Large Language Models&#8221;<\/a>.\u00a0<em>Proceedings of the ACM Collective Intelligence Conference<\/em>. CI &#8217;23. New York, NY, USA: Association for Computing Machinery. pp.\u00a012\u201324.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1145%2F3582269.3615599\">10.1145\/3582269.3615599<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISBN_(identifier)\">ISBN<\/a>\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/979-8-4007-0113-9\"><bdi>979-8-4007-0113-9<\/bdi><\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-150\">^<\/a><\/strong>\u00a0Choi, Hyeong Kyu; Xu, Weijie; Xue, Chi; Eckman, Stephanie; Reddy, Chandan K. (2024-09-27),\u00a0<em>Mitigating Selection Bias with Node Pruning and Auxiliary Options<\/em>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2409.18857\">2409.18857<\/a><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-151\">^<\/a><\/strong>\u00a0Zheng, Chujie; Zhou, Hao; Meng, Fandong; Zhou, Jie; Huang, Minlie (2023-09-07),\u00a0<em>Large Language Models Are Not Robust Multiple Choice Selectors<\/em>,\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2309.03882\">2309.03882<\/a><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-152\">^<\/a><\/strong>\u00a0Heikkil\u00e4, Melissa (August 7, 2023).\u00a0<a href=\"https:\/\/www.technologyreview.com\/2023\/08\/07\/1077324\/ai-language-models-are-rife-with-political-biases\/\">&#8220;AI language models are rife with different political biases&#8221;<\/a>.\u00a0<em>MIT Technology Review<\/em>. Retrieved\u00a02023-12-29.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-153\">^<\/a><\/strong>\u00a0Mehta, Sourabh (2024-07-03).\u00a0<a href=\"https:\/\/adasci.org\/how-much-energy-do-llms-consume-unveiling-the-power-behind-ai\/\">&#8220;How Much Energy Do LLMs Consume? Unveiling the Power Behind AI&#8221;<\/a>.\u00a0<em>Association of Data Scientists<\/em>. Retrieved\u00a02025-01-27.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-154\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.npr.org\/2024\/12\/09\/nx-s1-5171063\/artificial-intelligence-wants-to-go-nuclear-will-it-work\">&#8220;Artificial Intelligence wants to go nuclear. Will it work?&#8221;<\/a>.\u00a0<em>NPR<\/em>. Retrieved\u00a02025-01-27.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model#cite_ref-155\">^<\/a><\/strong>\u00a0Roy, Dareen (December 19, 2024).\u00a0<a href=\"https:\/\/www.reuters.com\/technology\/artificial-intelligence\/ais-energy-hunger-fuels-geothermal-startups-natgas-rivalry-clouds-future-2024-12-19\/\">&#8220;AI&#8217;s energy hunger fuels geothermal startups but natgas rivalry clouds future&#8221;<\/a>.\u00a0<em>Reuters<\/em>.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"\u53c2\u8003\u8d44\u6599\">4. \u4e2d\u6587\u8bcd\u6761\u53c2\u8003\u8d44\u6599<\/h2>\n\n\n\n<ol class=\"wp-block-list has-small-font-size\">\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-1\">^<\/a><\/strong>\u00a0Goled, Shraddha.\u00a0<a href=\"https:\/\/analyticsindiamag.com\/self-supervised-learning-vs-semi-supervised-learning-how-they-differ\/\">Self-Supervised Learning Vs Semi-Supervised Learning: How They Differ<\/a>. Analytics India Magazine. May 7, 2021\u00a0[2023-06-08]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230618152100\/https:\/\/analyticsindiamag.com\/self-supervised-learning-vs-semi-supervised-learning-how-they-differ\/\">\u5b58\u6863<\/a>\u4e8e2023-06-18\uff09.<\/li>\n\n\n\n<li>^\u00a0<a href=\"https:\/\/zh.wikipedia.org\/w\/index.php?title=Christopher_D._Manning&amp;action=edit&amp;redlink=1\">Manning, Christopher D.<\/a>\u00a0<a href=\"https:\/\/www.amacad.org\/publication\/human-language-understanding-reasoning\">Human Language Understanding &amp; Reasoning<\/a>. Daedalus. 2022,\u00a0<strong>151<\/strong>\u00a0(2): 127\u2013138\u00a0[2023-06-08].\u00a0<a href=\"https:\/\/api.semanticscholar.org\/CorpusID:248377870\">S2CID\u00a0248377870<\/a>.\u00a0<a href=\"https:\/\/doi.org\/10.1162%2Fdaed_a_01905\">doi:10.1162\/daed_a_01905<\/a>. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230309154322\/https:\/\/www.amacad.org\/publication\/human-language-understanding-reasoning\">\u5b58\u6863<\/a>\u4e8e2023-03-09\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-extracting_3-0\">^<\/a><\/strong>\u00a0Carlini, Nicholas; Tramer, Florian; Wallace, Eric; Jagielski, Matthew; Herbert-Voss, Ariel; Lee, Katherine; Roberts, Adam; Brown, Tom B; Song, Dawn; Erlingsson, Ulfar.\u00a0<a href=\"https:\/\/www.usenix.org\/system\/files\/sec21-carlini-extracting.pdf\">Extracting Training Data from Large Language Models<\/a>\u00a0(PDF). USENIX Security Symposium\u00a0<strong>6<\/strong>. 2021\u00a0[2023-06-08]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20231221210608\/https:\/\/www.usenix.org\/system\/files\/sec21-carlini-extracting.pdf\">\u5b58\u6863<\/a>\u00a0(PDF)\u4e8e2023-12-21\uff09.<\/li>\n\n\n\n<li>^\u00a0Kotek, Hadas; Dockum, Rikker; Sun, David.\u00a0<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3582269.3615599\">Gender bias and stereotypes in Large Language Models<\/a>. Proceedings of The ACM Collective Intelligence Conference. CI &#8217;23 (New York, NY, USA: Association for Computing Machinery). 2023-11-05.\u00a0<a href=\"https:\/\/zh.wikipedia.org\/wiki\/Special:%E7%BD%91%E7%BB%9C%E4%B9%A6%E6%BA%90\/979-8-4007-0113-9\">ISBN\u00a0979-8-4007-0113-9<\/a>.\u00a0<a href=\"https:\/\/doi.org\/10.1145%2F3582269.3615599\">doi:10.1145\/3582269.3615599<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-5\">^<\/a><\/strong>\u00a0Davidson, Thomas; Bhattacharya, Debasmita; Weber, Ingmar. Roberts, Sarah T.; Tetreault, Joel; Prabhakaran, Vinodkumar; Waseem, Zeerak , \u7f16.\u00a0<a href=\"https:\/\/aclanthology.org\/W19-3504\">Racial Bias in Hate Speech and Abusive Language Detection Datasets<\/a>. Proceedings of the Third Workshop on Abusive Language Online (Florence, Italy: Association for Computational Linguistics). 2019-08.\u00a0<a href=\"https:\/\/doi.org\/10.18653%2Fv1%2FW19-3504\">doi:10.18653\/v1\/W19-3504<\/a>.<\/li>\n\n\n\n<li>^\u00a0Queenie Luo; Michael J. Puett; Michael D. Smith.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2303.16281\">A Perspectival Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, Wikipedia, and YouTube<\/a>. arXiv. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20240416094547\/https:\/\/arxiv.org\/abs\/2303.16281\">\u5b58\u6863<\/a>\u4e8e2024-04-16\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-7\">^<\/a><\/strong>\u00a0Goodman, Joshua, A Bit of Progress in Language Modeling, 2001-08-09,\u00a0<a href=\"https:\/\/ui.adsabs.harvard.edu\/abs\/2001cs........8005G\">Bibcode:2001cs&#8230;&#8230;..8005G<\/a>,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/cs\/0108005\">arXiv:cs\/0108005<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\"><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-8\">^<\/a><\/strong>\u00a0Kilgarriff, Adam; Grefenstette, Gregory.\u00a0<a href=\"https:\/\/direct.mit.edu\/coli\/article\/29\/3\/333-347\/1816\">Introduction to the Special Issue on the Web as Corpus<\/a>. Computational Linguistics. September 2003,\u00a0<strong>29<\/strong>\u00a0(3): 333\u2013347.\u00a0<a href=\"https:\/\/www.worldcat.org\/issn\/0891-2017\">ISSN\u00a00891-2017<\/a>.\u00a0<a href=\"https:\/\/doi.org\/10.1162%2F089120103322711569\">doi:10.1162\/089120103322711569<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-9\">^<\/a><\/strong>\u00a0Banko, Michele; Brill, Eric.\u00a0<a href=\"https:\/\/dx.doi.org\/10.3115\/1073012.1073017\">Scaling to very very large corpora for natural language disambiguation<\/a>. Proceedings of the 39th Annual Meeting on Association for Computational Linguistics &#8211; ACL &#8217;01 (Morristown, NJ, USA: Association for Computational Linguistics). 2001: 26\u201333.\u00a0<a href=\"https:\/\/doi.org\/10.3115%2F1073012.1073017\">doi:10.3115\/1073012.1073017<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-10\">^<\/a><\/strong>\u00a0Resnik, Philip; Smith, Noah A.\u00a0<a href=\"https:\/\/direct.mit.edu\/coli\/article\/29\/3\/349-380\/1809\">The Web as a Parallel Corpus<\/a>. Computational Linguistics. September 2003,\u00a0<strong>29<\/strong>\u00a0(3): 349\u2013380\u00a0[2024-06-07].\u00a0<a href=\"https:\/\/www.worldcat.org\/issn\/0891-2017\">ISSN\u00a00891-2017<\/a>.\u00a0<a href=\"https:\/\/doi.org\/10.1162%2F089120103322711578\">doi:10.1162\/089120103322711578<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20240607172811\/https:\/\/direct.mit.edu\/coli\/article\/29\/3\/349-380\/1809\">\u5b58\u6863<\/a>\u4e8e2024-06-07\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-11\">^<\/a><\/strong>\u00a0Halevy, Alon; Norvig, Peter; Pereira, Fernando.\u00a0<a href=\"https:\/\/ieeexplore.ieee.org\/document\/4804817\">The Unreasonable Effectiveness of Data<\/a>. IEEE Intelligent Systems. March 2009,\u00a0<strong>24<\/strong>\u00a0(2): 8\u201312.\u00a0<a href=\"https:\/\/www.worldcat.org\/issn\/1541-1672\">ISSN\u00a01541-1672<\/a>.\u00a0<a href=\"https:\/\/doi.org\/10.1109%2FMIS.2009.36\">doi:10.1109\/MIS.2009.36<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-12\">^<\/a><\/strong>\u00a0Chen, Leiyu; Li, Shaobo; Bai, Qiang; Yang, Jing; Jiang, Sanlong; Miao, Yanming. Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sensing. 2021,\u00a0<strong>13<\/strong>\u00a0(22): 4712.\u00a0<a href=\"https:\/\/ui.adsabs.harvard.edu\/abs\/2021RemS...13.4712C\">Bibcode:2021RemS&#8230;13.4712C<\/a>.\u00a0<a href=\"https:\/\/doi.org\/10.3390%2Frs13224712\">doi:10.3390\/rs13224712<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-13\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/zh.wikipedia.org\/w\/index.php?title=Ashish_Vaswani&amp;action=edit&amp;redlink=1\">Vaswani, Ashish<\/a>; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion;\u00a0<a href=\"https:\/\/zh.wikipedia.org\/w\/index.php?title=Aidan_Gomez&amp;action=edit&amp;redlink=1\">Gomez, Aidan N<\/a>; Kaiser, \u0141ukasz; Polosukhin, Illia.\u00a0<a href=\"https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf\">Attention is All you Need<\/a>\u00a0(PDF). Advances in Neural Information Processing Systems (Curran Associates, Inc.). 2017,\u00a0<strong>30<\/strong>\u00a0[2024-01-21]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20240221141113\/https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf\">\u5b58\u6863<\/a>\u00a0(PDF)\u4e8e2024-02-21\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-14\">^<\/a><\/strong>\u00a0Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua. Neural Machine Translation by Jointly Learning to Align and Translate. 2014.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1409.0473\">arXiv:1409.0473<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-15\">^<\/a><\/strong>\u00a0Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna.\u00a0<a href=\"https:\/\/aclanthology.org\/2020.tacl-1.54\">A Primer in BERTology: What We Know About How BERT Works<\/a>. Transactions of the Association for Computational Linguistics. 2020,\u00a0<strong>8<\/strong>: 842\u2013866\u00a0[2024-01-21].\u00a0<a href=\"https:\/\/api.semanticscholar.org\/CorpusID:211532403\">S2CID\u00a0211532403<\/a>.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2002.12327\">arXiv:2002.12327<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">.\u00a0<a href=\"https:\/\/doi.org\/10.1162%2Ftacl_a_00349\">doi:10.1162\/tacl_a_00349<\/a>. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20220403103310\/https:\/\/aclanthology.org\/2020.tacl-1.54\/\">\u5b58\u6863<\/a>\u4e8e2022-04-03\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-16\">^<\/a><\/strong>\u00a0Movva, Rajiv; Balachandar, Sidhika; Peng, Kenny; Agostini, Gabriel; Garg, Nikhil; Pierson, Emma.\u00a0<a href=\"https:\/\/aclanthology.org\/2024.naacl-long.67\">Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers<\/a>. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024: 1223\u20131243\u00a0[2024-12-08].\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2307.10700\">arXiv:2307.10700<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">.\u00a0<a href=\"https:\/\/doi.org\/10.18653%2Fv1%2F2024.naacl-long.67\">doi:10.18653\/v1\/2024.naacl-long.67<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-17\">^<\/a><\/strong>\u00a0Hern, Alex.\u00a0<a href=\"https:\/\/www.theguardian.com\/technology\/2019\/feb\/14\/elon-musk-backed-ai-writes-convincing-news-fiction\">New AI fake text generator may be too dangerous to release, say creators<\/a>.\u00a0<a href=\"https:\/\/zh.wikipedia.org\/wiki\/The_Guardian\">The Guardian<\/a>. 14 February 2019\u00a0[20 January\u00a02024]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20190214173112\/https:\/\/www.theguardian.com\/technology\/2019\/feb\/14\/elon-musk-backed-ai-writes-convincing-news-fiction\">\u5b58\u6863<\/a>\u4e8e14 February 2019\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-18\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.euronews.com\/next\/2023\/11\/30\/chatgpt-a-year-on-3-ways-the-ai-chatbot-has-completely-changed-the-world-in-12-months\">ChatGPT a year on: 3 ways the AI chatbot has completely changed the world in 12 months<\/a>.\u00a0<a href=\"https:\/\/zh.wikipedia.org\/wiki\/Euronews\">Euronews<\/a>. November 30, 2023\u00a0[January 20,\u00a02024]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20240114025250\/https:\/\/www.euronews.com\/next\/2023\/11\/30\/chatgpt-a-year-on-3-ways-the-ai-chatbot-has-completely-changed-the-world-in-12-months\">\u5b58\u6863<\/a>\u4e8eJanuary 14, 2024\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-19\">^<\/a><\/strong>\u00a0Heaven, Will.\u00a0<a href=\"https:\/\/www.technologyreview.com\/2023\/03\/14\/1069823\/gpt-4-is-bigger-and-better-chatgpt-openai\/\">GPT-4 is bigger and better than ChatGPT\u2014but OpenAI won&#8217;t say why<\/a>.\u00a0<a href=\"https:\/\/zh.wikipedia.org\/wiki\/MIT_Technology_Review\">MIT Technology Review<\/a>. March 14, 2023\u00a0[January 20,\u00a02024]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230317224201\/https:\/\/www.technologyreview.com\/2023\/03\/14\/1069823\/gpt-4-is-bigger-and-better-chatgpt-openai\/\">\u5b58\u6863<\/a>\u4e8eMarch 17, 2023\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-20\">^<\/a><\/strong>\u00a0Movva, Rajiv; Balachandar, Sidhika; Peng, Kenny; Agostini, Gabriel; Garg, Nikhil; Pierson, Emma.\u00a0<a href=\"https:\/\/aclanthology.org\/2024.naacl-long.67\">Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers<\/a>. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024: 1223\u20131243\u00a0[2024-12-08].\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2307.10700\">arXiv:2307.10700<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">.\u00a0<a href=\"https:\/\/doi.org\/10.18653%2Fv1%2F2024.naacl-long.67\">doi:10.18653\/v1\/2024.naacl-long.67<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-21\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/ourworldindata.org\/grapher\/artificial-intelligence-parameter-count?time=2017-09-05..latest\">Parameters in notable artificial intelligence systems<\/a>. ourworldindata.org. November 30, 2023\u00a0[January 20,\u00a02024].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-22\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/huggingface.co\/spaces\/lmsys\/chatbot-arena-leaderboard\">LMSYS Chatbot Arena Leaderboard<\/a>. huggingface.co.\u00a0[June 12,\u00a02024]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20240610162906\/https:\/\/huggingface.co\/spaces\/lmsys\/chatbot-arena-leaderboard\">\u5b58\u6863<\/a>\u4e8eJune 10, 2024\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-23\">^<\/a><\/strong>\u00a0Sharma, Shubham.\u00a0<a href=\"https:\/\/venturebeat.com\/ai\/open-source-deepseek-r1-uses-pure-reinforcement-learning-to-match-openai-o1-at-95-less-cost\/\">Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 \u2014 at 95% less cost<\/a>. VentureBeat. 2025-01-20\u00a0[2025-01-26]\u00a0<strong>\uff08\u7f8e\u56fd\u82f1\u8bed\uff09<\/strong>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-24\">^<\/a><\/strong>\u00a0Zia, Dr Tehseen.\u00a0<a href=\"https:\/\/www.unite.ai\/unveiling-of-large-multimodal-models-shaping-the-landscape-of-language-models-in-2024\/\">Unveiling of Large Multimodal Models: Shaping the Landscape of Language Models in 2024<\/a>. Unite.AI. 2024-01-08\u00a0[2024-12-28]\u00a0<strong>\uff08\u7f8e\u56fd\u82f1\u8bed\uff09<\/strong>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-25\">^<\/a><\/strong>\u00a0Peng, Bo; et al. RWKV: Reinventing RNNS for the Transformer Era. 2023.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2305.13048\">arXiv:2305.13048<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-26\">^<\/a><\/strong>\u00a0Merritt, Rick.\u00a0<a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/03\/25\/what-is-a-transformer-model\/\">What Is a Transformer Model?<\/a>. NVIDIA Blog. 2022-03-25\u00a0[2023-07-25]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20231117203924\/https:\/\/blogs.nvidia.com\/blog\/what-is-a-transformer-model\/\">\u5b58\u6863<\/a>\u4e8e2023-11-17\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-27\">^<\/a><\/strong>\u00a0Gu, Albert; Dao, Tri, Mamba: Linear-Time Sequence Modeling with Selective State Spaces, 2023-12-01,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2312.00752\">arXiv:2312.00752<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\"><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-28\">^<\/a><\/strong>\u00a0Kaushal, Ayush; Mahowald, Kyle, What do tokens know about their characters and how do they know it?, 2022-06-06,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2206.02608\">arXiv:2206.02608<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\"><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-29\">^<\/a><\/strong>\u00a0Yennie Jun.\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230817165705\/https:\/\/blog.yenniejun.com\/p\/all-languages-are-not-created-tokenized\">All languages are NOT created (tokenized) equal<\/a>. Language models cost much more in some languages than others. 2023-05-03\u00a0[2023-08-17]. \uff08<a href=\"https:\/\/blog.yenniejun.com\/p\/all-languages-are-not-created-tokenized\">\u539f\u59cb\u5185\u5bb9<\/a>\u5b58\u6863\u4e8e2023-08-17\uff09.\u00a0<q>In other words, to express the same sentiment, some languages require up to 10 times more tokens.<\/q><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-30\">^<\/a><\/strong>\u00a0Petrov, Aleksandar; Malfa, Emanuele La; Torr, Philip; Bibi, Adel.\u00a0<a href=\"https:\/\/openreview.net\/forum?id=Pj4YYuxTq9\">Language Model Tokenizers Introduce Unfairness Between Languages<\/a>. NeurIPS. June 23, 2023\u00a0[September 16,\u00a02023].\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2305.15425\">arXiv:2305.15425<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20231215212906\/https:\/\/openreview.net\/forum?id=Pj4YYuxTq9\">\u5b58\u6863<\/a>\u4e8eDecember 15, 2023\uff09 \u2013\u901a\u8fc7openreview.net.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-xbiWb_31-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/web.archive.org\/web\/20230423211308\/https:\/\/platform.openai.com\/tokenizer\">OpenAI API<\/a>. platform.openai.com.\u00a0[2023-04-30]. \uff08<a href=\"https:\/\/platform.openai.com\/\">\u539f\u59cb\u5185\u5bb9<\/a>\u5b58\u6863\u4e8eApril 23, 2023\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-2022Book_32-0\">^<\/a><\/strong>\u00a0Paa\u00df, Gerhard; Giesselbach, Sven.\u00a0<a href=\"https:\/\/link.springer.com\/chapter\/10.1007\/978-3-031-23190-2_2\">Pre-trained Language Models<\/a>. Foundation Models for Natural Language Processing. Artificial Intelligence: Foundations, Theory, and Algorithms. 2022: 19\u201378\u00a0[3 August\u00a02023].\u00a0<a href=\"https:\/\/zh.wikipedia.org\/wiki\/Special:%E7%BD%91%E7%BB%9C%E4%B9%A6%E6%BA%90\/9783031231902\">ISBN\u00a09783031231902<\/a>.\u00a0<a href=\"https:\/\/doi.org\/10.1007%2F978-3-031-23190-2_2\">doi:10.1007\/978-3-031-23190-2_2<\/a>. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230803212329\/https:\/\/link.springer.com\/chapter\/10.1007\/978-3-031-23190-2_2\">\u5b58\u6863<\/a>\u4e8e3 August 2023\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-33\">^<\/a><\/strong>\u00a0Petrov, Aleksandar; Emanuele La Malfa; Torr, Philip H. S.; Bibi, Adel. Language Model Tokenizers Introduce Unfairness Between Languages. 2023.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2305.15425\">arXiv:2305.15425<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-34\">^<\/a><\/strong>\u00a0Lundberg, Scott.\u00a0<a href=\"https:\/\/towardsdatascience.com\/the-art-of-prompt-design-prompt-boundaries-and-token-healing-3b2448b0be38\">The Art of Prompt Design: Prompt Boundaries and Token Healing<\/a>. Medium. 2023-12-12\u00a0[2024-08-05]\u00a0<strong>\uff08\u82f1\u8bed\uff09<\/strong>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-aYNg4_35-0\">^<\/a><\/strong>\u00a0Dodge, Jesse; Sap, Maarten; Marasovi\u0107, Ana; Agnew, William; Ilharco, Gabriel; Groeneveld, Dirk; Mitchell, Margaret; Gardner, Matt. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus. 2021.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2104.08758\">arXiv:2104.08758<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-36\">^<\/a><\/strong>\u00a0Lee, Katherine; Ippolito, Daphne; Nystrom, Andrew; Zhang, Chiyuan; Eck, Douglas; Callison-Burch, Chris; Carlini, Nicholas.\u00a0<a href=\"https:\/\/aclanthology.org\/2022.acl-long.577.pdf\">Deduplicating Training Data Makes Language Models Better<\/a>\u00a0(PDF). Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. May 2022,. 1: Long Papers: 8424\u20138445.\u00a0<a href=\"https:\/\/doi.org\/10.18653%2Fv1%2F2022.acl-long.577\">doi:10.18653\/v1\/2022.acl-long.577<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-37\">^<\/a><\/strong>\u00a0Li, Yuanzhi; Bubeck, S\u00e9bastien; Eldan, Ronen; Del Giorno, Allie; Gunasekar, Suriya; Lee, Yin Tat, Textbooks Are All You Need II: phi-1.5 technical report, 2023-09-11,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2309.05463\">arXiv:2309.05463<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\"><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-38\">^<\/a><\/strong>\u00a0Lin, Zhenghao; Gou, Zhibin; Gong, Yeyun; Liu, Xiao; Shen, Yelong; Xu, Ruochen; Lin, Chen; Yang, Yujiu; Jiao, Jian. Rho-1: Not All Tokens Are What You Need. 2024-04-11.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2404.07965\">arXiv:2404.07965<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-qbFw1_39-0\">^<\/a><\/strong>\u00a0Brown, Tom B.; et al. Language Models are Few-Shot Learners. 2020.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2005.14165\">arXiv:2005.14165<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-40\">^<\/a><\/strong>\u00a0Abdin, Marah; Jacobs, Sam Ade; Awan, Ammar Ahmad; Aneja, Jyoti; Awadallah, Ahmed; Awadalla, Hany; Bach, Nguyen; Bahree, Amit; Bakhtiari, Arash. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. 2024-04-23.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2404.14219\">arXiv:2404.14219<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-41\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.ibm.com\/topics\/instruction-tuning\">What is instruction tuning?<\/a>. IBM.\u00a0[2024-12-09].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-instructGPT-paper_42-0\">^<\/a><\/strong>\u00a0Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie; Askell, Amanda; Welinder, Peter; Christiano, Paul; Leike, Jan; Lowe, Ryan. Training language models to follow instructions with human feedback. 2022.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2203.02155\">arXiv:2203.02155<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-HGZCJ_43-0\">^<\/a><\/strong>\u00a0Shazeer, Noam; Mirhoseini, Azalia; Maziarz, Krzysztof; Davis, Andy; Le, Quoc; Hinton, Geoffrey; Dean, Jeff. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. 2017-01-01.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1701.06538\">arXiv:1701.06538<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-R9Qq5_44-0\">^<\/a><\/strong>\u00a0Lepikhin, Dmitry; Lee, HyoukJoong; Xu, Yuanzhong; Chen, Dehao; Firat, Orhan; Huang, Yanping; Krikun, Maxim; Shazeer, Noam; Chen, Zhifeng. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. 2021-01-12.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2006.16668\">arXiv:2006.16668<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-emergentpaper_45-0\">^<\/a><\/strong>\u00a0Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William.\u00a0<a href=\"https:\/\/openreview.net\/forum?id=yzkSU5zdwD\">Emergent Abilities of Large Language Models<\/a>. Transactions on Machine Learning Research. 31 August 2022\u00a0[19 March\u00a02023].\u00a0<a href=\"https:\/\/www.worldcat.org\/issn\/2835-8856\">ISSN\u00a02835-8856<\/a>. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230322210052\/https:\/\/openreview.net\/forum?id=yzkSU5zdwD\">\u5b58\u6863<\/a>\u4e8e22 March 2023\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-Jay_Allamar_46-0\">^<\/a><\/strong>\u00a0Allamar, Jay.\u00a0<a href=\"https:\/\/jalammar.github.io\/illustrated-transformer\/\">Illustrated transformer<\/a>.\u00a0[2023-07-29]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230725230033\/http:\/\/jalammar.github.io\/illustrated-transformer\/\">\u5b58\u6863<\/a>\u4e8e2023-07-25\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-Jay_Allamar_GPT2_47-0\">^<\/a><\/strong>\u00a0Allamar, Jay.\u00a0<a href=\"https:\/\/jalammar.github.io\/illustrated-gpt2\/\">The Illustrated GPT-2 (Visualizing Transformer Language Models)<\/a>.\u00a0[2023-08-01].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-2022Book_2_48-0\">^<\/a><\/strong>\u00a0Paa\u00df, Gerhard; Giesselbach, Sven.\u00a0<a href=\"https:\/\/link.springer.com\/chapter\/10.1007\/978-3-031-23190-2_2\">Pre-trained Language Models<\/a>. Foundation Models for Natural Language Processing. Artificial Intelligence: Foundations, Theory, and Algorithms. 2022: 19\u201378\u00a0[3 August\u00a02023].\u00a0<a href=\"https:\/\/zh.wikipedia.org\/wiki\/Special:%E7%BD%91%E7%BB%9C%E4%B9%A6%E6%BA%90\/9783031231902\">ISBN\u00a09783031231902<\/a>.\u00a0<a href=\"https:\/\/doi.org\/10.1007%2F978-3-031-23190-2_2\">doi:10.1007\/978-3-031-23190-2_2<\/a>. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230803212329\/https:\/\/link.springer.com\/chapter\/10.1007\/978-3-031-23190-2_2\">\u5b58\u6863<\/a>\u4e8e3 August 2023\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-49\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/blog.google\/technology\/ai\/google-gemini-next-generation-model-february-2024\/#context-window\">Our next-generation model: Gemini 1.5<\/a>. Google. 15 February 2024\u00a0[18 February\u00a02024]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20240218141522\/https:\/\/blog.google\/technology\/ai\/google-gemini-next-generation-model-february-2024\/#context-window\">\u5b58\u6863<\/a>\u4e8e18 February 2024\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-50\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/www.anthropic.com\/news\/claude-2-1-prompting\">Long context prompting for Claude 2.1<\/a>. December 6, 2023\u00a0[January 20,\u00a02024]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20240827053830\/https:\/\/www.anthropic.com\/news\/claude-2-1-prompting\">\u5b58\u6863<\/a>\u4e8eAugust 27, 2024\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-51\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/platform.openai.com\/docs\/guides\/rate-limits\">Rate limits<\/a>. openai.com.\u00a0[January 20,\u00a02024]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20240202003219\/https:\/\/platform.openai.com\/docs\/guides\/rate-limits\">\u5b58\u6863<\/a>\u4e8eFebruary 2, 2024\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-ioUpE_52-0\">^<\/a><\/strong>\u00a0Zaib, Munazza; Sheng, Quan Z.; Emma Zhang, Wei.\u00a0<a href=\"https:\/\/www.researchgate.net\/publication\/338931711\">A Short Survey of Pre-trained Language Models for Conversational AI-A New Age in NLP<\/a>. Proceedings of the Australasian Computer Science Week Multiconference. 4 February 2020: 1\u20134.\u00a0<a href=\"https:\/\/zh.wikipedia.org\/wiki\/Special:%E7%BD%91%E7%BB%9C%E4%B9%A6%E6%BA%90\/9781450376976\">ISBN\u00a09781450376976<\/a>.\u00a0<a href=\"https:\/\/api.semanticscholar.org\/CorpusID:211040895\">S2CID\u00a0211040895<\/a>.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2104.10810\">arXiv:2104.10810<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">.\u00a0<a href=\"https:\/\/doi.org\/10.1145%2F3373017.3373028\">doi:10.1145\/3373017.3373028<\/a>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-jm_53-0\">^<\/a><\/strong>\u00a0Jurafsky, Dan; Martin, James H.\u00a0<a href=\"https:\/\/web.stanford.edu\/~jurafsky\/slp3\/ed3book_jan72023.pdf\">Speech and Language Processing<\/a>\u00a0(PDF)\u00a03rd edition draft. 7 January 2023\u00a0[24 May\u00a02022]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230323210221\/https:\/\/web.stanford.edu\/~jurafsky\/slp3\/ed3book_jan72023.pdf\">\u5b58\u6863<\/a>\u00a0(PDF)\u4e8e23 March 2023\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-jm2_54-0\">^<\/a><\/strong>\u00a0Jurafsky, Dan; Martin, James H.\u00a0<a href=\"https:\/\/web.stanford.edu\/~jurafsky\/slp3\/ed3book_jan72023.pdf\">Speech and Language Processing<\/a>\u00a0(PDF)\u00a03rd edition draft. 7 January 2023\u00a0[24 May\u00a02022]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230323210221\/https:\/\/web.stanford.edu\/~jurafsky\/slp3\/ed3book_jan72023.pdf\">\u5b58\u6863<\/a>\u00a0(PDF)\u4e8e23 March 2023\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-Wiggers_55-0\">^<\/a><\/strong>\u00a0Wiggers, Kyle.\u00a0<a href=\"https:\/\/techcrunch.com\/2022\/04\/28\/the-emerging-types-of-language-models-and-why-they-matter\/\">The emerging types of language models and why they matter<\/a>. TechCrunch. 28 April 2022\u00a0[9 March\u00a02023]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230316072443\/https:\/\/techcrunch.com\/2022\/04\/28\/the-emerging-types-of-language-models-and-why-they-matter\/\">\u5b58\u6863<\/a>\u4e8e16 March 2023\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-xaytj_56-0\">^<\/a><\/strong>\u00a0Sharir, Or; Peleg, Barak; Shoham, Yoav. The Cost of Training NLP Models: A Concise Overview. 2020.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2004.08900\">arXiv:2004.08900<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-Pythia_57-0\">^<\/a><\/strong>\u00a0Biderman, Stella; Schoelkopf, Hailey; Anthony, Quentin; Bradley, Herbie; Khan, Mohammad Aflah; Purohit, Shivanshu; Prashanth, USVSN Sai. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling. April 2023.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2304.01373\">arXiv:2304.01373<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-58\">^<\/a><\/strong>\u00a0Maslej, Nestor; Fattorini, Loredana; Brynjolfsson, Erik; Etchemendy, John; Ligett, Katrina; Lyons, Terah; Manyika, James; Ngo, Helen; Niebles, Juan Carlos, Artificial Intelligence Index Report 2023, 2023-10-05,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2310.03715\">arXiv:2310.03715<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\"><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-kaplan-scaling_59-0\">^<\/a><\/strong>\u00a0Section 2.1 and Table 1,\u00a0Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B.; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, Dario. Scaling Laws for Neural Language Models. 2020.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2001.08361\">arXiv:2001.08361<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.LG\">cs.LG<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-60\">^<\/a><\/strong>\u00a0Kiros, Ryan; Salakhutdinov, Ruslan; Zemel, Rich.\u00a0<a href=\"https:\/\/proceedings.mlr.press\/v32\/kiros14.html\">Multimodal Neural Language Models<\/a>. Proceedings of the 31st International Conference on Machine Learning (PMLR). 2014-06-18: 595\u2013603\u00a0[2023-07-02]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230702195952\/https:\/\/proceedings.mlr.press\/v32\/kiros14.html\">\u5b58\u6863<\/a>\u4e8e2023-07-02\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-61\">^<\/a><\/strong>\u00a0Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E.\u00a0<a href=\"https:\/\/proceedings.neurips.cc\/paper\/2012\/hash\/c399862d3b9d6b76c8436e924a68c45b-Abstract.html\">ImageNet Classification with Deep Convolutional Neural Networks<\/a>. Advances in Neural Information Processing Systems (Curran Associates, Inc.). 2012,\u00a0<strong>25<\/strong>\u00a0[2023-07-02]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230702195952\/https:\/\/proceedings.neurips.cc\/paper\/2012\/hash\/c399862d3b9d6b76c8436e924a68c45b-Abstract.html\">\u5b58\u6863<\/a>\u4e8e2023-07-02\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-62\">^<\/a><\/strong>\u00a0Antol, Stanislaw; Agrawal, Aishwarya; Lu, Jiasen; Mitchell, Margaret; Batra, Dhruv; Zitnick, C. Lawrence; Parikh, Devi.\u00a0<a href=\"https:\/\/openaccess.thecvf.com\/content_iccv_2015\/html\/Antol_VQA_Visual_Question_ICCV_2015_paper.html\">VQA: Visual Question Answering<\/a>. ICCV. 2015: 2425\u20132433\u00a0[2023-07-02]. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230702195952\/https:\/\/openaccess.thecvf.com\/content_iccv_2015\/html\/Antol_VQA_Visual_Question_ICCV_2015_paper.html\">\u5b58\u6863<\/a>\u4e8e2023-07-02\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-63\">^<\/a><\/strong>\u00a0Li, Junnan; Li, Dongxu; Savarese, Silvio; Hoi, Steven. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. 2023-01-01.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2301.12597\">arXiv:2301.12597<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CV\">cs.CV<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-64\">^<\/a><\/strong>\u00a0Alayrac, Jean-Baptiste; Donahue, Jeff; Luc, Pauline; Miech, Antoine; Barr, Iain; Hasson, Yana; Lenc, Karel; Mensch, Arthur; Millican, Katherine; Reynolds, Malcolm; Ring, Roman; Rutherford, Eliza; Cabi, Serkan; Han, Tengda; Gong, Zhitao.\u00a0<a href=\"https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/hash\/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html\">Flamingo: a Visual Language Model for Few-Shot Learning<\/a>. Advances in Neural Information Processing Systems. 2022-12-06,\u00a0<strong>35<\/strong>: 23716\u201323736\u00a0[2023-07-02].\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2204.14198\">arXiv:2204.14198<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20230702195951\/https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/hash\/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html\">\u5b58\u6863<\/a>\u4e8e2023-07-02\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-65\">^<\/a><\/strong>\u00a0Liu, Haotian; Li, Chunyuan; Wu, Qingyang; Lee, Yong Jae. Visual Instruction Tuning. 2023-04-01.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2304.08485\">arXiv:2304.08485<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CV\">cs.CV<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-66\">^<\/a><\/strong>\u00a0Zhang, Hang; Li, Xin; Bing, Lidong. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding. 2023-06-01.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2306.02858\">arXiv:2306.02858<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-67\">^<\/a><\/strong>\u00a0OpenAI. GPT-4 Technical Report. 2023-03-27.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2303.08774\">arXiv:2303.08774<\/a>\u202f<img loading=\"lazy\" decoding=\"async\" width=\"9\" height=\"14\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/14px-Lock-green.svg.png 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/18px-Lock-green.svg.png 2x\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/65\/Lock-green.svg\/9px-Lock-green.svg.png\" alt=\"\u53ef\u514d\u8d39\u67e5\u9605\">\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-68\">^<\/a><\/strong>\u00a0OpenAI.\u00a0<a href=\"https:\/\/cdn.openai.com\/papers\/GPTV_System_Card.pdf\">GPT-4V(ision) System Card<\/a>\u00a0(PDF). September 25, 2023.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-69\">^<\/a><\/strong>\u00a0Pichai, Sundar,\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=cNfINi5CNbY&amp;t=931s\">Google Keynote (Google I\/O &#8217;23)<\/a>, timestamp 15:31, 10 May 2023\u00a0[2023-07-02]<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-70\">^<\/a><\/strong>\u00a0Wiggers, Kyle.\u00a0<a href=\"https:\/\/techcrunch.com\/2024\/09\/11\/mistral-releases-pixtral-its-first-multimodal-model\/?utm_medium=aisecret.us&amp;utm_source=aisecret.us&amp;utm_campaign=aisecret.us\">Mistral releases Pixtral 12B, its first multimodal model<\/a>. TechCrunch. 11 September 2024\u00a0[14 September\u00a02024].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-openai-o1_71-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/openai.com\/index\/introducing-openai-o1-preview\/\">Introducing OpenAI o1-preview<\/a>. OpenAI. 2024-09-12\u00a0[2025-02-03].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-openai-o12_72-0\">^<\/a><\/strong>\u00a0<a href=\"https:\/\/openai.com\/index\/introducing-openai-o1-preview\/\">Introducing OpenAI o1-preview<\/a>. OpenAI. 2024-09-12\u00a0[2025-02-03].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-nyt-o3_73-0\">^<\/a><\/strong>\u00a0Metz, Cade.\u00a0<a href=\"https:\/\/www.nytimes.com\/2024\/12\/20\/technology\/openai-new-ai-math-science.html\">OpenAI Unveils New A.I. That Can &#8216;Reason&#8217; Through Math and Science Problems<\/a>. The New York Times. 2024-12-20\u00a0[2025-02-03].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-nature-deepseek_74-0\">^<\/a><\/strong>\u00a0Gibney, Elizabeth.\u00a0<a href=\"https:\/\/www.nature.com\/articles\/d41586-025-00229-6\">China&#8217;s cheap, open AI model DeepSeek thrills scientists<\/a>. Nature. 2025-01-30\u00a0[2025-02-03].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-nyt-o32_75-0\">^<\/a><\/strong>\u00a0Metz, Cade.\u00a0<a href=\"https:\/\/www.nytimes.com\/2024\/12\/20\/technology\/openai-new-ai-math-science.html\">OpenAI Unveils New A.I. That Can &#8216;Reason&#8217; Through Math and Science Problems<\/a>. The New York Times. 2024-12-20\u00a0[2025-02-03].<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-76\">^<\/a><\/strong>\u00a0Lei Huang; Weijiang Yu; Weitao Ma.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2311.05232\">A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions<\/a>. arXiv. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20240416094547\/https:\/\/arxiv.org\/abs\/2311.05232\">\u5b58\u6863<\/a>\u4e8e2024-11-28\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-77\">^<\/a><\/strong>\u00a0Yucong Duan; Fuliang Tang; Zhendong Guo; Yingtian Mei; Yuxing Wang; Kunguang Wu; Zeyu Yang; Shuaishuai Huang; Shiming Gong.\u00a0<a href=\"https:\/\/rgdoi.net\/10.13140\/RG.2.2.12894.61762\">Global Large Language Model EQ and IQ Bias Evaluation -Released by DIKWP -AC Research Group<\/a>. ResearchGate. 2023.\u00a0<a href=\"https:\/\/doi.org\/10.13140%2FRG.2.2.12894.61762\">doi:10.13140\/RG.2.2.12894.61762<\/a>\u00a0\u2013\u901a\u8fc7ResearchGate\u00a0<strong>\uff08\u82f1\u8bed\uff09<\/strong>.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-78\">^<\/a><\/strong>\u00a0Zhou, Karen; Tan, Chenhao. Bouamor, Houda; Pino, Juan; Bali, Kalika , \u7f16.\u00a0<a href=\"https:\/\/aclanthology.org\/2023.findings-emnlp.696\">Entity-Based Evaluation of Political Bias in Automatic Summarization<\/a>. Findings of the Association for Computational Linguistics: EMNLP 2023 (Singapore: Association for Computational Linguistics). 2023-12\u00a0[2023-12-26].\u00a0<a href=\"https:\/\/doi.org\/10.18653%2Fv1%2F2023.findings-emnlp.696\">doi:10.18653\/v1\/2023.findings-emnlp.696<\/a>. \uff08\u539f\u59cb\u5185\u5bb9<a href=\"https:\/\/web.archive.org\/web\/20240424141927\/https:\/\/aclanthology.org\/2023.findings-emnlp.696\/\">\u5b58\u6863<\/a>\u4e8e2024-04-24\uff09.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-79\">^<\/a><\/strong>\u00a0Yucong Duan; Fuliang Tang; Kunguang Wu; Zhendong Guo; Shuaishuai Huang; Yingtian Mei; Yuxing Wang; Zeyu Yang; Shiming Gong.\u00a0<a href=\"https:\/\/rgdoi.net\/10.13140\/RG.2.2.26652.67200\">&#8220;Ranking of Large Language Model (LLM) Cultural Bias&#8221; &#8211;DIKWP Research Group International Standard Evaluation<\/a>. ResearchGate. 2024.\u00a0<a href=\"https:\/\/doi.org\/10.13140%2FRG.2.2.26652.67200\">doi:10.13140\/RG.2.2.26652.67200<\/a>\u00a0\u2013\u901a\u8fc7ResearchGate.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-80\">^<\/a><\/strong>\u00a0Yucong Duan; Fuliang Tang; Kunguang Wu; Zhendong Guo; Shuaishuai Huang; Yingtian Mei; Yuxing Wang; Zeyu Yang; Shiming Gong.\u00a0<a href=\"https:\/\/rgdoi.net\/10.13140\/RG.2.2.10019.63529\">&#8220;Ranking of Large Language Model (LLM) Regional Bias&#8221; &#8211;DIKWP Research Group International Standard Evaluation<\/a>. ResearchGate. 2024.\u00a0<a href=\"https:\/\/doi.org\/10.13140%2FRG.2.2.10019.63529\">doi:10.13140\/RG.2.2.10019.63529<\/a>\u00a0\u2013\u901a\u8fc7ResearchGate.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-81\">^<\/a><\/strong>\u00a0Yucong Duan; Fuliang Tang; Kunguang Wu; Zhendong Guo; Shuaishuai Huang; Yingtian Mei; Yuxing Wang; Zeyu Yang; Shiming Gong.\u00a0<a href=\"https:\/\/rgdoi.net\/10.13140\/RG.2.2.26397.12006\">&#8220;The Large Language Model (LLM) Bias Evaluation (Age Bias)&#8221; &#8211;DIKWP Research Group International Standard Evaluation<\/a>. ResearchGate. 2024.\u00a0<a href=\"https:\/\/doi.org\/10.13140%2FRG.2.2.26397.12006\">doi:10.13140\/RG.2.2.26397.12006<\/a>\u00a0\u2013\u901a\u8fc7ResearchGate.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B#cite_ref-82\">^<\/a><\/strong>\u00a0Yucong Duan; Fuliang Tang; Kunguang Wu; Zhendong Guo; Shuaishuai Huang; Yingtian Mei; Yuxing Wang; Zeyu Yang; Shiming Gong.\u00a0<a href=\"https:\/\/rgdoi.net\/10.13140\/RG.2.2.23041.67689\">&#8220;The Large Language Model (LLM) Bias Evaluation (Occupational Bias)&#8221; &#8211;DIKWP Research Group International Standard Evaluation<\/a>. ResearchGate. 2024.\u00a0<a href=\"https:\/\/doi.org\/10.13140%2FRG.2.2.23041.67689\">doi:10.13140\/RG.2.2.23041.67689<\/a>\u00a0\u2013\u901a\u8fc7ResearchGate.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">5. \u5ef6\u4f38\u9605\u8bfb | Further reading<\/h2>\n\n\n\n<ul class=\"wp-block-list has-small-font-size\">\n<li><a href=\"https:\/\/huggingface.co\/spaces\/HuggingFaceH4\/open_llm_leaderboard\">Open LLM Leaderboard\uff08\u5f00\u653eLLM\u6392\u884c\u699c\u65e8\u5728\u8ddf\u8e2a\u3001\u6392\u540d\u548c\u8bc4\u4f30\u5f00\u653eLLM\u548c\u804a\u5929\u673a\u5668\u4eba\uff09<\/a>&nbsp;\uff08<a href=\"https:\/\/web.archive.org\/web\/20231222103236\/https:\/\/huggingface.co\/spaces\/HuggingFaceH4\/open_llm_leaderboard\">\u9875\u9762\u5b58\u6863\u5907\u4efd<\/a>\uff0c\u5b58\u4e8e<a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E4%BA%92%E8%81%94%E7%BD%91%E6%A1%A3%E6%A1%88%E9%A6%86\">\u4e92\u8054\u7f51\u6863\u6848\u9986<\/a>\uff09<\/li>\n\n\n\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Dan_Jurafsky\">Jurafsky, Dan<\/a>, Martin, James. H.\u00a0<a href=\"https:\/\/web.stanford.edu\/~jurafsky\/slp3\/ed3book_jan72023.pdf\"><em>Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition<\/em><\/a>, 3rd Edition draft, 2023.<\/li>\n\n\n\n<li>Zhao, Wayne Xin; et\u00a0al. (2023). &#8220;A Survey of Large Language Models&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2303.18223\">2303.18223<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li>Kaddour, Jean; et\u00a0al. (2023). &#8220;Challenges and Applications of Large Language Models&#8221;.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2307.10169\">2307.10169<\/a>\u00a0[<a href=\"https:\/\/arxiv.org\/archive\/cs.CL\">cs.CL<\/a>].<\/li>\n\n\n\n<li>Yin, Shukang; Fu, Chaoyou; Zhao, Sirui; Li, Ke; Sun, Xing; Xu, Tong; Chen, Enhong (2024).\u00a0<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC11645129\">&#8220;A Survey on Multimodal Large Language Models&#8221;<\/a>.\u00a0<em>National Science Review<\/em>.\u00a0<strong>11<\/strong>\u00a0(12): nwae403.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ArXiv_(identifier)\">arXiv<\/a>:<a href=\"https:\/\/arxiv.org\/abs\/2306.13549\">2306.13549<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1093%2Fnsr%2Fnwae403\">10.1093\/nsr\/nwae403<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/PMC_(identifier)\">PMC<\/a>\u00a0<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC11645129\">11645129<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/PMID_(identifier)\">PMID<\/a>\u00a0<a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/39679213\">39679213<\/a>.<\/li>\n\n\n\n<li><a href=\"https:\/\/aiindex.stanford.edu\/report\/\">&#8220;AI Index Report 2024 \u2013 Artificial Intelligence Index&#8221;<\/a>.\u00a0<em>aiindex.stanford.edu<\/em>. Retrieved\u00a02024-05-05.<\/li>\n\n\n\n<li>Frank, Michael C. (27 June 2023).\u00a0<a href=\"https:\/\/www.nature.com\/articles\/s44159-023-00211-x\">&#8220;Baby steps in evaluating the capacities of large language models&#8221;<\/a>.\u00a0<em>Nature Reviews Psychology<\/em>.\u00a0<strong>2<\/strong>\u00a0(8):\u00a0451\u2013452.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\">doi<\/a>:<a href=\"https:\/\/doi.org\/10.1038%2Fs44159-023-00211-x\">10.1038\/s44159-023-00211-x<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ISSN_(identifier)\">ISSN<\/a>\u00a0<a href=\"https:\/\/search.worldcat.org\/issn\/2731-0574\">2731-0574<\/a>.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/S2CID_(identifier)\">S2CID<\/a>\u00a0<a href=\"https:\/\/api.semanticscholar.org\/CorpusID:259713140\">259713140<\/a>. Retrieved\u00a02 July\u00a02023.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\u4e2d\u6587\u8bcd\u6761\u539f\u6587\u94fe\u63a5\uff08\u65e0\u6cd5\u4ece\u4e2d\u56fd\u5185\u5730\u8bbf\u95ee\uff09\uff1a\u8bf7\u70b9 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[132],"tags":[155,101],"class_list":["post-5525","post","type-post","status-publish","format-standard","hentry","category-ai","tag-155","tag-101"],"_links":{"self":[{"href":"https:\/\/cathayvista.top\/index.php\/wp-json\/wp\/v2\/posts\/5525","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cathayvista.top\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cathayvista.top\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cathayvista.top\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cathayvista.top\/index.php\/wp-json\/wp\/v2\/comments?post=5525"}],"version-history":[{"count":0,"href":"https:\/\/cathayvista.top\/index.php\/wp-json\/wp\/v2\/posts\/5525\/revisions"}],"wp:attachment":[{"href":"https:\/\/cathayvista.top\/index.php\/wp-json\/wp\/v2\/media?parent=5525"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cathayvista.top\/index.php\/wp-json\/wp\/v2\/categories?post=5525"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cathayvista.top\/index.php\/wp-json\/wp\/v2\/tags?post=5525"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}