大语言模型 / Large language model (LLM) – 中英文维基百科词条融合

中文词条原文链接（无法从中国内地访问）：请点击这里访问
英文词条原文链接（无法从中国内地访问）：请点击这里访问
本文基于英文词条的线索，并补充部分来自中文词条的内容（在二者冲突时，以更晚更新者为准）。辽观搬运时进行了必要的合规化处理，以使其能够在中国内地上传。部分文字采用汉语拼音方式代替，音节后的数字表示汉语拼音规则中的声调。

关于辽观的维基百科搬运计划，及其他已搬运的词条，请点击这里了解更多。维基百科（Wikipedia）是美国维基媒体基金会的互联网百科项目，其内容可能受到立场、信息来源等因素影响，请客观看待。正文内容不代表译者观点。

辽观提供的翻译仅供参考。文中可能包含无法从中国内地访问的链接。
辽观所搬运的词条文本与维基百科一道同样遵循CC BY-SA 4.0协议（辽观搬运的中英文对照版本），在符合协议要求的情况下您可以免费使用其内容（包括商用）。图片和视频可能遵循不同的共享协议。请点击这里访问

1. 正文（发布于知乎专栏）
2. 参见（维基百科的相关词条）| See also
3. 英文词条参考文献 | References
4. 中文词条参考资料
5. 延伸阅读 | Further reading

1. 正文（发布于知乎专栏）

请点击这里访问

2. 参见（维基百科的相关词条）| See also

3. 英文词条参考文献 | References

^ Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (Dec 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (eds.). “Language Models are Few-Shot Learners” (PDF). Advances in Neural Information Processing Systems. 33. Curran Associates, Inc.: 1877–1901. Archived (PDF) from the original on 2023-11-17. Retrieved 2023-03-14.
^ Fathallah, Nadeen; Das, Arunav; De Giorgis, Stefano; Poltronieri, Andrea; Haase, Peter; Kovriguina, Liubov (2024-05-26). NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning (PDF). Extended Semantic Web Conference 2024. Hersonissos, Greece.
^ Manning, Christopher D. (2022). “Human Language Understanding & Reasoning”. Daedalus. 151 (2): 127–138. doi:10.1162/daed_a_01905. S2CID 248377870. Archived from the original on 2023-11-17. Retrieved 2023-03-09.
^ Goodman, Joshua (2001-08-09), A Bit of Progress in Language Modeling, arXiv:cs/0108005, Bibcode:2001cs……..8005G
^ Kilgarriff, Adam; Grefenstette, Gregory (September 2003). “Introduction to the Special Issue on the Web as Corpus”. Computational Linguistics. 29 (3): 333–347. doi:10.1162/089120103322711569. ISSN 0891-2017.
^ Banko, Michele; Brill, Eric (2001). “Scaling to very very large corpora for natural language disambiguation”. Proceedings of the 39th Annual Meeting on Association for Computational Linguistics – ACL ’01. Morristown, NJ, USA: Association for Computational Linguistics: 26–33. doi:10.3115/1073012.1073017.
^ Resnik, Philip; Smith, Noah A. (September 2003). “The Web as a Parallel Corpus”. Computational Linguistics. 29 (3): 349–380. doi:10.1162/089120103322711578. ISSN 0891-2017. Archived from the original on 2024-06-07. Retrieved 2024-06-07.
^ Halevy, Alon; Norvig, Peter; Pereira, Fernando (March 2009). “The Unreasonable Effectiveness of Data”. IEEE Intelligent Systems. 24 (2): 8–12. doi:10.1109/MIS.2009.36. ISSN 1541-1672.
^ Chen, Leiyu; Li, Shaobo; Bai, Qiang; Yang, Jing; Jiang, Sanlong; Miao, Yanming (2021). “Review of Image Classification Algorithms Based on Convolutional Neural Networks”. Remote Sensing. 13 (22): 4712. Bibcode:2021RemS…13.4712C. doi:10.3390/rs13224712.
^ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). “Attention is All you Need” (PDF). Advances in Neural Information Processing Systems. 30. Curran Associates, Inc. Archived (PDF) from the original on 2024-02-21. Retrieved 2024-01-21.
^ Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua (2014). “Neural Machine Translation by Jointly Learning to Align and Translate”. arXiv:1409.0473 [cs.CL].
^ Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). “A Primer in BERTology: What We Know About How BERT Works”. Transactions of the Association for Computational Linguistics. 8: 842–866. arXiv:2002.12327. doi:10.1162/tacl_a_00349. S2CID 211532403. Archived from the original on 2022-04-03. Retrieved 2024-01-21.
^ Movva, Rajiv; Balachandar, Sidhika; Peng, Kenny; Agostini, Gabriel; Garg, Nikhil; Pierson, Emma (2024). “Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers”. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). pp. 1223–1243. arXiv:2307.10700. doi:10.18653/v1/2024.naacl-long.67. Retrieved 2024-12-08.
^ Hern, Alex (14 February 2019). “New AI fake text generator may be too dangerous to release, say creators”. The Guardian. Archived from the original on 14 February 2019. Retrieved 20 January 2024.
^ “ChatGPT a year on: 3 ways the AI chatbot has completely changed the world in 12 months”. Euronews. November 30, 2023. Archived from the original on January 14, 2024. Retrieved January 20, 2024.
^ Heaven, Will (March 14, 2023). “GPT-4 is bigger and better than ChatGPT—but OpenAI won’t say why”. MIT Technology Review. Archived from the original on March 17, 2023. Retrieved January 20, 2024.
^ Movva, Rajiv; Balachandar, Sidhika; Peng, Kenny; Agostini, Gabriel; Garg, Nikhil; Pierson, Emma (2024). “Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers”. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). pp. 1223–1243. arXiv:2307.10700. doi:10.18653/v1/2024.naacl-long.67. Retrieved 2024-12-08.
^ “Parameters in notable artificial intelligence systems”. ourworldindata.org. November 30, 2023. Retrieved January 20, 2024.
^ Sharma, Shubham (2025-01-20). “Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost”. VentureBeat. Retrieved 2025-01-26.
^ Zia, Dr Tehseen (2024-01-08). “Unveiling of Large Multimodal Models: Shaping the Landscape of Language Models in 2024”. Unite.AI. Retrieved 2024-12-28.
^ Peng, Bo; et al. (2023). “RWKV: Reinventing RNNS for the Transformer Era”. arXiv:2305.13048 [cs.CL].
^ Merritt, Rick (2022-03-25). “What Is a Transformer Model?”. NVIDIA Blog. Archived from the original on 2023-11-17. Retrieved 2023-07-25.
^ Gu, Albert; Dao, Tri (2023-12-01), Mamba: Linear-Time Sequence Modeling with Selective State Spaces, arXiv:2312.00752
^ Kaushal, Ayush; Mahowald, Kyle (2022-06-06), What do tokens know about their characters and how do they know it?, arXiv:2206.02608
^ Yennie Jun (2023-05-03). “All languages are NOT created (tokenized) equal”. Language models cost much more in some languages than others. Archived from the original on 2023-08-17. Retrieved 2023-08-17. In other words, to express the same sentiment, some languages require up to 10 times more tokens.
^ Petrov, Aleksandar; Malfa, Emanuele La; Torr, Philip; Bibi, Adel (June 23, 2023). “Language Model Tokenizers Introduce Unfairness Between Languages”. NeurIPS. arXiv:2305.15425. Archived from the original on December 15, 2023. Retrieved September 16, 2023 – via openreview.net.
^ “OpenAI API”. platform.openai.com. Archived from the original on April 23, 2023. Retrieved 2023-04-30.
^ Paaß, Gerhard; Giesselbach, Sven (2022). “Pre-trained Language Models”. Foundation Models for Natural Language Processing. Artificial Intelligence: Foundations, Theory, and Algorithms. pp. 19–78. doi:10.1007/978-3-031-23190-2_2. ISBN 9783031231902. Archived from the original on 3 August 2023. Retrieved 3 August 2023.
^ Petrov, Aleksandar; Emanuele La Malfa; Torr, Philip H. S.; Bibi, Adel (2023). “Language Model Tokenizers Introduce Unfairness Between Languages”. arXiv:2305.15425 [cs.CL].
^ Lundberg, Scott (2023-12-12). “The Art of Prompt Design: Prompt Boundaries and Token Healing”. Medium. Retrieved 2024-08-05.
^ Dodge, Jesse; Sap, Maarten; Marasović, Ana; Agnew, William; Ilharco, Gabriel; Groeneveld, Dirk; Mitchell, Margaret; Gardner, Matt (2021). “Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus”. arXiv:2104.08758 [cs.CL].
^ Lee, Katherine; Ippolito, Daphne; Nystrom, Andrew; Zhang, Chiyuan; Eck, Douglas; Callison-Burch, Chris; Carlini, Nicholas (May 2022). “Deduplicating Training Data Makes Language Models Better” (PDF). Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 1: Long Papers: 8424–8445. doi:10.18653/v1/2022.acl-long.577.
^ Li, Yuanzhi; Bubeck, Sébastien; Eldan, Ronen; Del Giorno, Allie; Gunasekar, Suriya; Lee, Yin Tat (2023-09-11), Textbooks Are All You Need II: phi-1.5 technical report, arXiv:2309.05463
^ Lin, Zhenghao; Gou, Zhibin; Gong, Yeyun; Liu, Xiao; Shen, Yelong; Xu, Ruochen; Lin, Chen; Yang, Yujiu; Jiao, Jian (2024-04-11). “Rho-1: Not All Tokens Are What You Need”. arXiv:2404.07965 [cs.CL].
^ Brown, Tom B.; et al. (2020). “Language Models are Few-Shot Learners”. arXiv:2005.14165 [cs.CL].
^ Abdin, Marah; Jacobs, Sam Ade; Awan, Ammar Ahmad; Aneja, Jyoti; Awadallah, Ahmed; Awadalla, Hany; Bach, Nguyen; Bahree, Amit; Bakhtiari, Arash (2024-04-23). “Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone”. arXiv:2404.14219 [cs.CL].
^ Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie; Askell, Amanda; Welinder, Peter; Christiano, Paul; Leike, Jan; Lowe, Ryan (2022). “Training language models to follow instructions with human feedback”. arXiv:2203.02155 [cs.CL].
^ Wang, Yizhong; Kordi, Yeganeh; Mishra, Swaroop; Liu, Alisa; Smith, Noah A.; Khashabi, Daniel; Hajishirzi, Hannaneh (2022). “Self-Instruct: Aligning Language Model with Self Generated Instructions”. arXiv:2212.10560 [cs.CL].
^ Shazeer, Noam; Mirhoseini, Azalia; Maziarz, Krzysztof; Davis, Andy; Le, Quoc; Hinton, Geoffrey; Dean, Jeff (2017-01-01). “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer”. arXiv:1701.06538 [cs.LG].
^ Lepikhin, Dmitry; Lee, HyoukJoong; Xu, Yuanzhong; Chen, Dehao; Firat, Orhan; Huang, Yanping; Krikun, Maxim; Shazeer, Noam; Chen, Zhifeng (2021-01-12). “GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding”. arXiv:2006.16668 [cs.CL].
^ Dai, Andrew M; Du, Nan (December 9, 2021). “More Efficient In-Context Learning with GLaM”. ai.googleblog.com. Archived from the original on 2023-03-12. Retrieved 2023-03-09.
^ Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). “Emergent Abilities of Large Language Models”. Transactions on Machine Learning Research. ISSN 2835-8856. Archived from the original on 22 March 2023. Retrieved 19 March 2023.
^ Allamar, Jay. “Illustrated transformer”. Archived from the original on 2023-07-25. Retrieved 2023-07-29.
^ Allamar, Jay. “The Illustrated GPT-2 (Visualizing Transformer Language Models)”. Retrieved 2023-08-01.
^ “Our next-generation model: Gemini 1.5”. Google. 15 February 2024. Archived from the original on 18 February 2024. Retrieved 18 February 2024.
^ “Long context prompting for Claude 2.1”. December 6, 2023. Archived from the original on August 27, 2024. Retrieved January 20, 2024.
^ “Rate limits”. openai.com. Archived from the original on February 2, 2024. Retrieved January 20, 2024.
^ Zaib, Munazza; Sheng, Quan Z.; Emma Zhang, Wei (4 February 2020). “A Short Survey of Pre-trained Language Models for Conversational AI-A New Age in NLP”. Proceedings of the Australasian Computer Science Week Multiconference. pp. 1–4. arXiv:2104.10810. doi:10.1145/3373017.3373028. ISBN 9781450376976. S2CID 211040895.
^ Jurafsky, Dan; Martin, James H. (7 January 2023). Speech and Language Processing (PDF) (3rd edition draft ed.). Archived (PDF) from the original on 23 March 2023. Retrieved 24 May 2022.
^ “From bare metal to a 70B model: infrastructure set-up and scripts”. imbue.com. Archived from the original on 2024-07-26. Retrieved 2024-07-24.
^ “metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq”. GitHub. Archived from the original on 2024-01-24. Retrieved 2024-07-24.
^ Albrecht, Josh (2024-07-23). “State of the Art: Training >70B LLMs on 10,000 H100 clusters”. www.latent.space. Retrieved 2024-07-24.
^ Maslej, Nestor; Fattorini, Loredana; Brynjolfsson, Erik; Etchemendy, John; Ligett, Katrina; Lyons, Terah; Manyika, James; Ngo, Helen; Niebles, Juan Carlos (2023-10-05), Artificial Intelligence Index Report 2023, arXiv:2310.03715
^ Section 2.1 and Table 1, Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B.; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, Dario (2020). “Scaling Laws for Neural Language Models”. arXiv:2001.08361 [cs.LG].
^ Gao, Luyu; Madaan, Aman; Zhou, Shuyan; Alon, Uri; Liu, Pengfei; Yang, Yiming; Callan, Jamie; Neubig, Graham (2022-11-01). “PAL: Program-aided Language Models”. arXiv:2211.10435 [cs.CL].
^ “PAL: Program-aided Language Models”. reasonwithpal.com. Archived from the original on 2023-06-12. Retrieved 2023-06-12.
^ Paranjape, Bhargavi; Lundberg, Scott; Singh, Sameer; Hajishirzi, Hannaneh; Zettlemoyer, Luke; Tulio Ribeiro, Marco (2023-03-01). “ART: Automatic multi-step reasoning and tool-use for large language models”. arXiv:2303.09014 [cs.CL].
^ Liang, Yaobo; Wu, Chenfei; Song, Ting; Wu, Wenshan; Xia, Yan; Liu, Yu; Ou, Yang; Lu, Shuai; Ji, Lei; Mao, Shaoguang; Wang, Yun; Shou, Linjun; Gong, Ming; Duan, Nan (2023-03-01). “TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs”. arXiv:2303.16434 [cs.AI].
^ Patil, Shishir G.; Zhang, Tianjun; Wang, Xin; Gonzalez, Joseph E. (2023-05-01). “Gorilla: Large Language Model Connected with Massive APIs”. arXiv:2305.15334 [cs.CL].
^ Lewis, Patrick; Perez, Ethan; Piktus, Aleksandra; Petroni, Fabio; Karpukhin, Vladimir; Goyal, Naman; Küttler, Heinrich; Lewis, Mike; Yih, Wen-tau; Rocktäschel, Tim; Riedel, Sebastian; Kiela, Douwe (2020). “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”. Advances in Neural Information Processing Systems. 33. Curran Associates, Inc.: 9459–9474. arXiv:2005.11401. Archived from the original on 2023-06-12. Retrieved 2023-06-12.
^ “The Growth Behind LLM-based Autonomous Agents”. KDnuggets. October 23, 2023.
^ Yao, Shunyu; Zhao, Jeffrey; Yu, Dian; Du, Nan; Shafran, Izhak; Narasimhan, Karthik; Cao, Yuan (2022-10-01). “ReAct: Synergizing Reasoning and Acting in Language Models”. arXiv:2210.03629 [cs.CL].
^ Wu, Yue; Prabhumoye, Shrimai; Min, So Yeon (24 May 2023). “SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning”. arXiv:2305.15486 [cs.AI].
^ Wang, Zihao; Cai, Shaofei; Liu, Anji; Ma, Xiaojian; Liang, Yitao (2023-02-03). “Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents”. arXiv:2302.01560 [cs.AI].
^ Shinn, Noah; Cassano, Federico; Labash, Beck; Gopinath, Ashwin; Narasimhan, Karthik; Yao, Shunyu (2023-03-01). “Reflexion: Language Agents with Verbal Reinforcement Learning”. arXiv:2303.11366 [cs.AI].
^ Hao, Shibo; Gu, Yi; Ma, Haodi; Jiahua Hong, Joshua; Wang, Zhen; Zhe Wang, Daisy; Hu, Zhiting (2023-05-01). “Reasoning with Language Model is Planning with World Model”. arXiv:2305.14992 [cs.CL].
^ Zhang, Jenny; Lehman, Joel; Stanley, Kenneth; Clune, Jeff (2 June 2023). “OMNI: Open-endedness via Models of human Notions of Interestingness”. arXiv:2306.01711 [cs.AI].
^ “Voyager | An Open-Ended Embodied Agent with Large Language Models”. voyager.minedojo.org. Archived from the original on 2023-06-08. Retrieved 2023-06-09.
^ Park, Joon Sung; O’Brien, Joseph C.; Cai, Carrie J.; Ringel Morris, Meredith; Liang, Percy; Bernstein, Michael S. (2023-04-01). “Generative Agents: Interactive Simulacra of Human Behavior”. arXiv:2304.03442 [cs.HC].
^ Mann, Tobias. “How to run an LLM locally on your PC in less than 10 minutes”. www.theregister.com. Retrieved 2024-05-17.
^ Nagel, Markus; Amjad, Rana Ali; Baalen, Mart Van; Louizos, Christos; Blankevoort, Tijmen (2020-11-21). “Up or Down? Adaptive Rounding for Post-Training Quantization”. Proceedings of the 37th International Conference on Machine Learning. PMLR: 7197–7206. Archived from the original on 2023-06-14. Retrieved 2023-06-14.
^ Polino, Antonio; Pascanu, Razvan; Alistarh, Dan (2018-02-01). “Model compression via distillation and quantization”. arXiv:1802.05668 [cs.NE].
^ Frantar, Elias; Ashkboos, Saleh; Hoefler, Torsten; Alistarh, Dan (2022-10-01). “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”. arXiv:2210.17323 [cs.LG].
^ Dettmers, Tim; Svirschevski, Ruslan; Egiazarian, Vage; Kuznedelev, Denis; Frantar, Elias; Ashkboos, Saleh; Borzunov, Alexander; Hoefler, Torsten; Alistarh, Dan (2023-06-01). “SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”. arXiv:2306.03078 [cs.CL].
^ Grootendorst, Maarten. “A Visual Guide to Quantization”. newsletter.maartengrootendorst.com. Archived from the original on 31 Jul 2024. Retrieved 2024-07-31.
^ Dettmers, Tim; Pagnoni, Artidoro; Holtzman, Ari; Zettlemoyer, Luke (2023-05-01). “QLoRA: Efficient Finetuning of Quantized LLMs”. arXiv:2305.14314 [cs.LG].
^ Kiros, Ryan; Salakhutdinov, Ruslan; Zemel, Rich (2014-06-18). “Multimodal Neural Language Models”. Proceedings of the 31st International Conference on Machine Learning. PMLR: 595–603. Archived from the original on 2023-07-02. Retrieved 2023-07-02.
^ Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E (2012). “ImageNet Classification with Deep Convolutional Neural Networks”. Advances in Neural Information Processing Systems. 25. Curran Associates, Inc. Archived from the original on 2023-07-02. Retrieved 2023-07-02.
^ Antol, Stanislaw; Agrawal, Aishwarya; Lu, Jiasen; Mitchell, Margaret; Batra, Dhruv; Zitnick, C. Lawrence; Parikh, Devi (2015). “VQA: Visual Question Answering”. ICCV: 2425–2433. Archived from the original on 2023-07-02. Retrieved 2023-07-02.
^ Li, Junnan; Li, Dongxu; Savarese, Silvio; Hoi, Steven (2023-01-01). “BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models”. arXiv:2301.12597 [cs.CV].
^ Alayrac, Jean-Baptiste; Donahue, Jeff; Luc, Pauline; Miech, Antoine; Barr, Iain; Hasson, Yana; Lenc, Karel; Mensch, Arthur; Millican, Katherine; Reynolds, Malcolm; Ring, Roman; Rutherford, Eliza; Cabi, Serkan; Han, Tengda; Gong, Zhitao (2022-12-06). “Flamingo: a Visual Language Model for Few-Shot Learning”. Advances in Neural Information Processing Systems. 35: 23716–23736. arXiv:2204.14198. Archived from the original on 2023-07-02. Retrieved 2023-07-02.
^ Driess, Danny; Xia, Fei; Sajjadi, Mehdi S. M.; Lynch, Corey; Chowdhery, Aakanksha; Ichter, Brian; Wahid, Ayzaan; Tompson, Jonathan; Vuong, Quan; Yu, Tianhe; Huang, Wenlong; Chebotar, Yevgen; Sermanet, Pierre; Duckworth, Daniel; Levine, Sergey (2023-03-01). “PaLM-E: An Embodied Multimodal Language Model”. arXiv:2303.03378 [cs.LG].
^ Liu, Haotian; Li, Chunyuan; Wu, Qingyang; Lee, Yong Jae (2023-04-01). “Visual Instruction Tuning”. arXiv:2304.08485 [cs.CV].
^ Zhang, Hang; Li, Xin; Bing, Lidong (2023-06-01). “Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding”. arXiv:2306.02858 [cs.CL].
^ OpenAI (2023-03-27). “GPT-4 Technical Report”. arXiv:2303.08774 [cs.CL].
^ OpenAI (September 25, 2023). “GPT-4V(ision) System Card” (PDF).
^ Pichai, Sundar (10 May 2023), Google Keynote (Google I/O ’23), timestamp 15:31, retrieved 2023-07-02
^ Wiggers, Kyle (11 September 2024). “Mistral releases Pixtral 12B, its first multimodal model”. TechCrunch. Retrieved 14 September 2024.
^ “Introducing OpenAI o1-preview”. OpenAI. 2024-09-12. Retrieved 2025-02-03.
^ Metz, Cade (2024-12-20). “OpenAI Unveils New A.I. That Can ‘Reason’ Through Math and Science Problems”. The New York Times. Retrieved 2025-02-03.
^ Gibney, Elizabeth (2025-01-30). “China’s cheap, open AI model DeepSeek thrills scientists”. Nature. Retrieved 2025-02-03.
^ Lin, Belle (2025-02-05). “Why Amazon is Betting on ‘Automated Reasoning’ to Reduce AI’s Hallucinations: The tech giant says an obscure field that combines AI and math can mitigate—but not completely eliminate—AI’s propensity to provide wrong answers”. Wall Street Journal. ISSN 0099-9660.
^ Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan (2022-03-29). “Training Compute-Optimal Large Language Models”. arXiv:2203.15556 [cs.CL].
^ Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). “Broken Neural Scaling Laws”. arXiv:2210.14891 [cs.LG].
^ “137 emergent abilities of large language models”. Jason Wei. Retrieved 2023-06-24.
^ Bowman, Samuel R. (2023). “Eight Things to Know about Large Language Models”. arXiv:2304.00612 [cs.CL].
^ Mukherjee, Anirban; Chang, Hannah (2024). “Heuristic Reasoning in AI: Instrumental Use and Mimetic Absorption”. arXiv:2403.09404 [cs.AI].
^ Hahn, Michael; Goyal, Navin (2023-03-14). “A Theory of Emergent In-Context Learning as Implicit Structure Induction”. arXiv:2303.07971 [cs.LG].
^ Pilehvar, Mohammad Taher; Camacho-Collados, Jose (June 2019). “Proceedings of the 2019 Conference of the North”. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics: 1267–1273. doi:10.18653/v1/N19-1128. S2CID 102353817. Archived from the original on 2023-06-27. Retrieved 2023-06-27.
^ “WiC: The Word-in-Context Dataset”. pilehvar.github.io. Archived from the original on 2023-06-27. Retrieved 2023-06-27.
^ Patel, Roma; Pavlick, Ellie (2021-10-06). “Mapping Language Models to Grounded Conceptual Spaces”. ICLR. Archived from the original on 2023-06-24. Retrieved 2023-06-27.
^ A Closer Look at Large Language Models Emergent Abilities Archived 2023-06-24 at the Wayback Machine (Yao Fu, Nov 20, 2022)
^ Ornes, Stephen (March 16, 2023). “The Unpredictable Abilities Emerging From Large AI Models”. Quanta Magazine. Archived from the original on March 16, 2023. Retrieved March 16, 2023.
^ Schaeffer, Rylan; Miranda, Brando; Koyejo, Sanmi (2023-04-01). “Are Emergent Abilities of Large Language Models a Mirage?”. arXiv:2304.15004 [cs.AI].
^ Blank, Idan A. (November 2023). “What are large language models supposed to model?”. Trends in Cognitive Sciences. 27 (11): 987–989. doi:10.1016/j.tics.2023.08.006. PMID 37659920.
^ Li, Kenneth; Hopkins, Aspen K.; Bau, David; Viégas, Fernanda; Pfister, Hanspeter; Wattenberg, Martin (2022-10-01). “Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task”. arXiv:2210.13382 [cs.LG].
^ “Large Language Model: world models or surface statistics?”. The Gradient. 2023-01-21. Retrieved 2023-06-12.
^ Jin, Charles; Rinard, Martin (2023-05-01). “Evidence of Meaning in Language Models Trained on Programs”. arXiv:2305.11169 [cs.LG].
^ Nanda, Neel; Chan, Lawrence; Lieberum, Tom; Smith, Jess; Steinhardt, Jacob (2023-01-01). “Progress measures for grokking via mechanistic interpretability”. arXiv:2301.05217 [cs.LG].
^ Mitchell, Melanie; Krakauer, David C. (28 March 2023). “The debate over understanding in AI’s large language models”. Proceedings of the National Academy of Sciences. 120 (13): e2215907120. arXiv:2210.13966. Bibcode:2023PNAS..12015907M. doi:10.1073/pnas.2215907120. PMC 10068812. PMID 36943882.
^ Metz, Cade (16 May 2023). “Microsoft Says New A.I. Shows Signs of Human Reasoning”. The New York Times.
^ Bubeck, Sébastien; Chandrasekaran, Varun; Eldan, Ronen; Gehrke, Johannes; Horvitz, Eric; Kamar, Ece; Lee, Peter; Lee, Yin Tat; Li, Yuanzhi; Lundberg, Scott; Nori, Harsha; Palangi, Hamid; Ribeiro, Marco Tulio; Zhang, Yi (2023). “Sparks of Artificial General Intelligence: Early experiments with GPT-4”. arXiv:2303.12712 [cs.CL].
^ “Anthropic CEO Dario Amodei pens a smart look at our AI future”. Fast Company. October 17, 2024.
^ “ChatGPT is more like an ‘alien intelligence’ than a human brain, says futurist”. ZDNET. 2023. Archived from the original on 12 June 2023. Retrieved 12 June 2023.
^ Newport, Cal (13 April 2023). “What Kind of Mind Does ChatGPT Have?”. The New Yorker. Archived from the original on 12 June 2023. Retrieved 12 June 2023.
^ Roose, Kevin (30 May 2023). “Why an Octopus-like Creature Has Come to Symbolize the State of A.I.” The New York Times. Archived from the original on 30 May 2023. Retrieved 12 June 2023.
^ “The A to Z of Artificial Intelligence”. Time Magazine. 13 April 2023. Archived from the original on 16 June 2023. Retrieved 12 June 2023.
^ Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Dai, Wenliang; Madotto, Andrea; Fung, Pascale (November 2022). “Survey of Hallucination in Natural Language Generation” (pdf). ACM Computing Surveys. 55 (12). Association for Computing Machinery: 1–38. arXiv:2202.03629. doi:10.1145/3571730. S2CID 246652372. Archived from the original on 26 March 2023. Retrieved 15 January 2023.
^ Varshney, Neeraj; Yao, Wenlin; Zhang, Hongming; Chen, Jianshu; Yu, Dong (2023). “A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation”. arXiv:2307.03987 [cs.CL].
^ Lakoff, George (1999). Philosophy in the Flesh: The Embodied Mind and Its Challenge to Western Philosophy; Appendix: The Neural Theory of Language Paradigm. New York Basic Books. pp. 569–583. ISBN 978-0-465-05674-3.
^ Evans, Vyvyan. (2014). The Language Myth. Cambridge University Press. ISBN 978-1-107-04396-1.
^ Friston, Karl J. (2022). Active Inference: The Free Energy Principle in Mind, Brain, and Behavior; Chapter 4 The Generative Models of Active Inference. The MIT Press. ISBN 978-0-262-36997-8.
^ Huyen, Chip (October 18, 2019). “Evaluation Metrics for Language Modeling”. The Gradient. Retrieved January 14, 2024.
^ Clark, Christopher; Lee, Kenton; Chang, Ming-Wei; Kwiatkowski, Tom; Collins, Michael; Toutanova, Kristina (2019). “BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions”. arXiv:1905.10044 [cs.CL].
^ Wayne Xin Zhao; Zhou, Kun; Li, Junyi; Tang, Tianyi; Wang, Xiaolei; Hou, Yupeng; Min, Yingqian; Zhang, Beichen; Zhang, Junjie; Dong, Zican; Du, Yifan; Yang, Chen; Chen, Yushuo; Chen, Zhipeng; Jiang, Jinhao; Ren, Ruiyang; Li, Yifan; Tang, Xinyu; Liu, Zikang; Liu, Peiyu; Nie, Jian-Yun; Wen, Ji-Rong (2023). “A Survey of Large Language Models”. arXiv:2303.18223 [cs.CL].
^ openai/simple-evals, OpenAI, 2024-05-28, retrieved 2024-05-28
^ openai/evals, OpenAI, 2024-05-28, archived from the original on 2024-05-08, retrieved 2024-05-28
^ “Sanitized open-source datasets for natural language and code understanding: how we evaluated our 70B model”. imbue.com. Archived from the original on 2024-07-26. Retrieved 2024-07-24.
^ Nangia, Nikita and Vania, Clara and Bhalerao, Rasika and Bowman, Samuel R. (November 2020). “CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models”. In Webber, Bonnie and Cohn, Trevor and He, Yulan and Liu, Yang (ed.). Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. pp. 1953–1967. arXiv:2010.00133. doi:10.18653/v1/2020.emnlp-main.154.
^ Nadeem, Moin and Bethke, Anna and Reddy, Siva (August 2021). “StereoSet: Measuring stereotypical bias in pretrained language models”. In Zong, Chengqing and Xia, Fei and Li, Wenjie and Navigli, Roberto (ed.). Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics. pp. 5356–5371. arXiv:2004.09456. doi:10.18653/v1/2021.acl-long.416.
^ Simpson, Shmona and Nukpezah, Jonathan and Kie Brooks and Pandya, Raaghav (17 December 2024). “Parity benchmark for measuring bias in LLMs”. AI and Ethics. Springer. doi:10.1007/s43681-024-00613-4.
^ Srivastava, Aarohi; et al. (2022). “Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models”. arXiv:2206.04615 [cs.CL].
^ Lin, Stephanie; Hilton, Jacob; Evans, Owain (2021). “TruthfulQA: Measuring How Models Mimic Human Falsehoods”. arXiv:2109.07958 [cs.CL].
^ Zellers, Rowan; Holtzman, Ari; Bisk, Yonatan; Farhadi, Ali; Choi, Yejin (2019). “HellaSwag: Can a Machine Really Finish Your Sentence?”. arXiv:1905.07830 [cs.CL].
^ “Prepare for truly useful large language models”. Nature Biomedical Engineering. 7 (2): 85–86. 7 March 2023. doi:10.1038/s41551-023-01012-6. PMID 36882584. S2CID 257403466.
^ “Your job is (probably) safe from artificial intelligence”. The Economist. 7 May 2023. Archived from the original on 17 June 2023. Retrieved 18 June 2023.
^ “Generative AI Could Raise Global GDP by 7%”. Goldman Sachs. Archived from the original on 18 June 2023. Retrieved 18 June 2023.
^ Peng, Zhencan; Wang, Zhizhi; Deng, Dong (13 June 2023). “Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation” (PDF). Proceedings of the ACM on Management of Data. 1 (2): 1–18. doi:10.1145/3589324. S2CID 259213212. Archived (PDF) from the original on 2024-08-27. Retrieved 2024-01-20. Citing Lee et al 2022.
^ Peng, Wang & Deng 2023, p. 8.
^ Stephen Council (1 Dec 2023). “How Googlers cracked an SF rival’s tech model with a single word”. SFGATE. Archived from the original on 16 December 2023.
^ Alba, Davey (1 May 2023). “AI chatbots have been used to create dozens of news content farms”. The Japan Times. Retrieved 18 June 2023.
^ “Could chatbots help devise the next pandemic virus?”. Science. 14 June 2023. doi:10.1126/science.adj2463. Archived from the original on 18 June 2023. Retrieved 18 June 2023.
^ Hubinger, Evan (10 January 2024). “Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training”. arXiv:2401.05566 [cs.CR].
^ Kang, Daniel (2023). “Exploiting programmatic behavior of LLMs: Dual-use through standard security attacks”. arXiv:2302.05733 [cs.CR].
^ Wang, Yongge (20 June 2024). “Encryption Based Covert Channel for Large Language Models” (PDF). IACR ePrint 2024/586. Archived (PDF) from the original on 24 June 2024. Retrieved 24 June 2024.
^ Stokel-Walker, Chris (November 22, 2023). “ChatGPT Replicates Gender Bias in Recommendation Letters”. Scientific American. Archived from the original on 2023-12-29. Retrieved 2023-12-29.
^ Luo, Queenie; Puett, Michael J.; Smith, Michael D. (2023-03-28). “A Perspectival Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, Wikipedia, and YouTube”. arXiv:2303.16281v2 [cs.CY].
^ Cheng, Myra; Durmus, Esin; Jurafsky, Dan (2023-05-29), Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models, arXiv:2305.18189
^ Kotek, Hadas; Dockum, Rikker; Sun, David (2023-11-05). “Gender bias and stereotypes in Large Language Models”. Proceedings of the ACM Collective Intelligence Conference. CI ’23. New York, NY, USA: Association for Computing Machinery. pp. 12–24. doi:10.1145/3582269.3615599. ISBN 979-8-4007-0113-9.
^ Choi, Hyeong Kyu; Xu, Weijie; Xue, Chi; Eckman, Stephanie; Reddy, Chandan K. (2024-09-27), Mitigating Selection Bias with Node Pruning and Auxiliary Options, arXiv:2409.18857
^ Zheng, Chujie; Zhou, Hao; Meng, Fandong; Zhou, Jie; Huang, Minlie (2023-09-07), Large Language Models Are Not Robust Multiple Choice Selectors, arXiv:2309.03882
^ Heikkilä, Melissa (August 7, 2023). “AI language models are rife with different political biases”. MIT Technology Review. Retrieved 2023-12-29.
^ Mehta, Sourabh (2024-07-03). “How Much Energy Do LLMs Consume? Unveiling the Power Behind AI”. Association of Data Scientists. Retrieved 2025-01-27.
^ “Artificial Intelligence wants to go nuclear. Will it work?”. NPR. Retrieved 2025-01-27.
^ Roy, Dareen (December 19, 2024). “AI’s energy hunger fuels geothermal startups but natgas rivalry clouds future”. Reuters.

4. 中文词条参考资料

^ Goled, Shraddha. Self-Supervised Learning Vs Semi-Supervised Learning: How They Differ. Analytics India Magazine. May 7, 2021 [2023-06-08]. （原始内容存档于2023-06-18）.
^ Manning, Christopher D. Human Language Understanding & Reasoning. Daedalus. 2022, 151 (2): 127–138 [2023-06-08]. S2CID 248377870. doi:10.1162/daed_a_01905. （原始内容存档于2023-03-09）.
^ Carlini, Nicholas; Tramer, Florian; Wallace, Eric; Jagielski, Matthew; Herbert-Voss, Ariel; Lee, Katherine; Roberts, Adam; Brown, Tom B; Song, Dawn; Erlingsson, Ulfar. Extracting Training Data from Large Language Models (PDF). USENIX Security Symposium 6. 2021 [2023-06-08]. （原始内容存档 (PDF)于2023-12-21）.
^ Kotek, Hadas; Dockum, Rikker; Sun, David. Gender bias and stereotypes in Large Language Models. Proceedings of The ACM Collective Intelligence Conference. CI ’23 (New York, NY, USA: Association for Computing Machinery). 2023-11-05. ISBN 979-8-4007-0113-9. doi:10.1145/3582269.3615599.
^ Davidson, Thomas; Bhattacharya, Debasmita; Weber, Ingmar. Roberts, Sarah T.; Tetreault, Joel; Prabhakaran, Vinodkumar; Waseem, Zeerak , 编. Racial Bias in Hate Speech and Abusive Language Detection Datasets. Proceedings of the Third Workshop on Abusive Language Online (Florence, Italy: Association for Computational Linguistics). 2019-08. doi:10.18653/v1/W19-3504.
^ Queenie Luo; Michael J. Puett; Michael D. Smith. A Perspectival Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, Wikipedia, and YouTube. arXiv. （原始内容存档于2024-04-16）.
^ Goodman, Joshua, A Bit of Progress in Language Modeling, 2001-08-09, Bibcode:2001cs……..8005G, arXiv:cs/0108005 
^ Kilgarriff, Adam; Grefenstette, Gregory. Introduction to the Special Issue on the Web as Corpus. Computational Linguistics. September 2003, 29 (3): 333–347. ISSN 0891-2017. doi:10.1162/089120103322711569.
^ Banko, Michele; Brill, Eric. Scaling to very very large corpora for natural language disambiguation. Proceedings of the 39th Annual Meeting on Association for Computational Linguistics – ACL ’01 (Morristown, NJ, USA: Association for Computational Linguistics). 2001: 26–33. doi:10.3115/1073012.1073017.
^ Resnik, Philip; Smith, Noah A. The Web as a Parallel Corpus. Computational Linguistics. September 2003, 29 (3): 349–380 [2024-06-07]. ISSN 0891-2017. doi:10.1162/089120103322711578 . （原始内容存档于2024-06-07）.
^ Halevy, Alon; Norvig, Peter; Pereira, Fernando. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems. March 2009, 24 (2): 8–12. ISSN 1541-1672. doi:10.1109/MIS.2009.36.
^ Chen, Leiyu; Li, Shaobo; Bai, Qiang; Yang, Jing; Jiang, Sanlong; Miao, Yanming. Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sensing. 2021, 13 (22): 4712. Bibcode:2021RemS…13.4712C. doi:10.3390/rs13224712 .
^ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia. Attention is All you Need (PDF). Advances in Neural Information Processing Systems (Curran Associates, Inc.). 2017, 30 [2024-01-21]. （原始内容存档 (PDF)于2024-02-21）.
^ Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua. Neural Machine Translation by Jointly Learning to Align and Translate. 2014. arXiv:1409.0473  [cs.CL].
^ Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna. A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics. 2020, 8: 842–866 [2024-01-21]. S2CID 211532403. arXiv:2002.12327 . doi:10.1162/tacl_a_00349. （原始内容存档于2022-04-03）.
^ Movva, Rajiv; Balachandar, Sidhika; Peng, Kenny; Agostini, Gabriel; Garg, Nikhil; Pierson, Emma. Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024: 1223–1243 [2024-12-08]. arXiv:2307.10700 . doi:10.18653/v1/2024.naacl-long.67.
^ Hern, Alex. New AI fake text generator may be too dangerous to release, say creators. The Guardian. 14 February 2019 [20 January 2024]. （原始内容存档于14 February 2019）.
^ ChatGPT a year on: 3 ways the AI chatbot has completely changed the world in 12 months. Euronews. November 30, 2023 [January 20, 2024]. （原始内容存档于January 14, 2024）.
^ Heaven, Will. GPT-4 is bigger and better than ChatGPT—but OpenAI won’t say why. MIT Technology Review. March 14, 2023 [January 20, 2024]. （原始内容存档于March 17, 2023）.
^ Movva, Rajiv; Balachandar, Sidhika; Peng, Kenny; Agostini, Gabriel; Garg, Nikhil; Pierson, Emma. Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024: 1223–1243 [2024-12-08]. arXiv:2307.10700 . doi:10.18653/v1/2024.naacl-long.67.
^ Parameters in notable artificial intelligence systems. ourworldindata.org. November 30, 2023 [January 20, 2024].
^ LMSYS Chatbot Arena Leaderboard. huggingface.co. [June 12, 2024]. （原始内容存档于June 10, 2024）.
^ Sharma, Shubham. Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost. VentureBeat. 2025-01-20 [2025-01-26] （美国英语）.
^ Zia, Dr Tehseen. Unveiling of Large Multimodal Models: Shaping the Landscape of Language Models in 2024. Unite.AI. 2024-01-08 [2024-12-28] （美国英语）.
^ Peng, Bo; et al. RWKV: Reinventing RNNS for the Transformer Era. 2023. arXiv:2305.13048  [cs.CL].
^ Merritt, Rick. What Is a Transformer Model?. NVIDIA Blog. 2022-03-25 [2023-07-25]. （原始内容存档于2023-11-17）.
^ Gu, Albert; Dao, Tri, Mamba: Linear-Time Sequence Modeling with Selective State Spaces, 2023-12-01, arXiv:2312.00752 
^ Kaushal, Ayush; Mahowald, Kyle, What do tokens know about their characters and how do they know it?, 2022-06-06, arXiv:2206.02608 
^ Yennie Jun. All languages are NOT created (tokenized) equal. Language models cost much more in some languages than others. 2023-05-03 [2023-08-17]. （原始内容存档于2023-08-17）. In other words, to express the same sentiment, some languages require up to 10 times more tokens.
^ Petrov, Aleksandar; Malfa, Emanuele La; Torr, Philip; Bibi, Adel. Language Model Tokenizers Introduce Unfairness Between Languages. NeurIPS. June 23, 2023 [September 16, 2023]. arXiv:2305.15425 . （原始内容存档于December 15, 2023） –通过openreview.net.
^ OpenAI API. platform.openai.com. [2023-04-30]. （原始内容存档于April 23, 2023）.
^ Paaß, Gerhard; Giesselbach, Sven. Pre-trained Language Models. Foundation Models for Natural Language Processing. Artificial Intelligence: Foundations, Theory, and Algorithms. 2022: 19–78 [3 August 2023]. ISBN 9783031231902. doi:10.1007/978-3-031-23190-2_2. （原始内容存档于3 August 2023）.
^ Petrov, Aleksandar; Emanuele La Malfa; Torr, Philip H. S.; Bibi, Adel. Language Model Tokenizers Introduce Unfairness Between Languages. 2023. arXiv:2305.15425  [cs.CL].
^ Lundberg, Scott. The Art of Prompt Design: Prompt Boundaries and Token Healing. Medium. 2023-12-12 [2024-08-05] （英语）.
^ Dodge, Jesse; Sap, Maarten; Marasović, Ana; Agnew, William; Ilharco, Gabriel; Groeneveld, Dirk; Mitchell, Margaret; Gardner, Matt. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus. 2021. arXiv:2104.08758  [cs.CL].
^ Lee, Katherine; Ippolito, Daphne; Nystrom, Andrew; Zhang, Chiyuan; Eck, Douglas; Callison-Burch, Chris; Carlini, Nicholas. Deduplicating Training Data Makes Language Models Better (PDF). Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. May 2022,. 1: Long Papers: 8424–8445. doi:10.18653/v1/2022.acl-long.577.
^ Li, Yuanzhi; Bubeck, Sébastien; Eldan, Ronen; Del Giorno, Allie; Gunasekar, Suriya; Lee, Yin Tat, Textbooks Are All You Need II: phi-1.5 technical report, 2023-09-11, arXiv:2309.05463 
^ Lin, Zhenghao; Gou, Zhibin; Gong, Yeyun; Liu, Xiao; Shen, Yelong; Xu, Ruochen; Lin, Chen; Yang, Yujiu; Jiao, Jian. Rho-1: Not All Tokens Are What You Need. 2024-04-11. arXiv:2404.07965  [cs.CL].
^ Brown, Tom B.; et al. Language Models are Few-Shot Learners. 2020. arXiv:2005.14165  [cs.CL].
^ Abdin, Marah; Jacobs, Sam Ade; Awan, Ammar Ahmad; Aneja, Jyoti; Awadallah, Ahmed; Awadalla, Hany; Bach, Nguyen; Bahree, Amit; Bakhtiari, Arash. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. 2024-04-23. arXiv:2404.14219  [cs.CL].
^ What is instruction tuning?. IBM. [2024-12-09].
^ Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie; Askell, Amanda; Welinder, Peter; Christiano, Paul; Leike, Jan; Lowe, Ryan. Training language models to follow instructions with human feedback. 2022. arXiv:2203.02155  [cs.CL].
^ Shazeer, Noam; Mirhoseini, Azalia; Maziarz, Krzysztof; Davis, Andy; Le, Quoc; Hinton, Geoffrey; Dean, Jeff. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. 2017-01-01. arXiv:1701.06538  [cs.LG].
^ Lepikhin, Dmitry; Lee, HyoukJoong; Xu, Yuanzhong; Chen, Dehao; Firat, Orhan; Huang, Yanping; Krikun, Maxim; Shazeer, Noam; Chen, Zhifeng. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. 2021-01-12. arXiv:2006.16668  [cs.CL].
^ Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William. Emergent Abilities of Large Language Models. Transactions on Machine Learning Research. 31 August 2022 [19 March 2023]. ISSN 2835-8856. （原始内容存档于22 March 2023）.
^ Allamar, Jay. Illustrated transformer. [2023-07-29]. （原始内容存档于2023-07-25）.
^ Allamar, Jay. The Illustrated GPT-2 (Visualizing Transformer Language Models). [2023-08-01].
^ Paaß, Gerhard; Giesselbach, Sven. Pre-trained Language Models. Foundation Models for Natural Language Processing. Artificial Intelligence: Foundations, Theory, and Algorithms. 2022: 19–78 [3 August 2023]. ISBN 9783031231902. doi:10.1007/978-3-031-23190-2_2. （原始内容存档于3 August 2023）.
^ Our next-generation model: Gemini 1.5. Google. 15 February 2024 [18 February 2024]. （原始内容存档于18 February 2024）.
^ Long context prompting for Claude 2.1. December 6, 2023 [January 20, 2024]. （原始内容存档于August 27, 2024）.
^ Rate limits. openai.com. [January 20, 2024]. （原始内容存档于February 2, 2024）.
^ Zaib, Munazza; Sheng, Quan Z.; Emma Zhang, Wei. A Short Survey of Pre-trained Language Models for Conversational AI-A New Age in NLP. Proceedings of the Australasian Computer Science Week Multiconference. 4 February 2020: 1–4. ISBN 9781450376976. S2CID 211040895. arXiv:2104.10810 . doi:10.1145/3373017.3373028.
^ Jurafsky, Dan; Martin, James H. Speech and Language Processing (PDF) 3rd edition draft. 7 January 2023 [24 May 2022]. （原始内容存档 (PDF)于23 March 2023）.
^ Jurafsky, Dan; Martin, James H. Speech and Language Processing (PDF) 3rd edition draft. 7 January 2023 [24 May 2022]. （原始内容存档 (PDF)于23 March 2023）.
^ Wiggers, Kyle. The emerging types of language models and why they matter. TechCrunch. 28 April 2022 [9 March 2023]. （原始内容存档于16 March 2023）.
^ Sharir, Or; Peleg, Barak; Shoham, Yoav. The Cost of Training NLP Models: A Concise Overview. 2020. arXiv:2004.08900  [cs.CL].
^ Biderman, Stella; Schoelkopf, Hailey; Anthony, Quentin; Bradley, Herbie; Khan, Mohammad Aflah; Purohit, Shivanshu; Prashanth, USVSN Sai. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling. April 2023. arXiv:2304.01373  [cs.CL].
^ Maslej, Nestor; Fattorini, Loredana; Brynjolfsson, Erik; Etchemendy, John; Ligett, Katrina; Lyons, Terah; Manyika, James; Ngo, Helen; Niebles, Juan Carlos, Artificial Intelligence Index Report 2023, 2023-10-05, arXiv:2310.03715 
^ Section 2.1 and Table 1, Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B.; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, Dario. Scaling Laws for Neural Language Models. 2020. arXiv:2001.08361  [cs.LG].
^ Kiros, Ryan; Salakhutdinov, Ruslan; Zemel, Rich. Multimodal Neural Language Models. Proceedings of the 31st International Conference on Machine Learning (PMLR). 2014-06-18: 595–603 [2023-07-02]. （原始内容存档于2023-07-02）.
^ Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems (Curran Associates, Inc.). 2012, 25 [2023-07-02]. （原始内容存档于2023-07-02）.
^ Antol, Stanislaw; Agrawal, Aishwarya; Lu, Jiasen; Mitchell, Margaret; Batra, Dhruv; Zitnick, C. Lawrence; Parikh, Devi. VQA: Visual Question Answering. ICCV. 2015: 2425–2433 [2023-07-02]. （原始内容存档于2023-07-02）.
^ Li, Junnan; Li, Dongxu; Savarese, Silvio; Hoi, Steven. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. 2023-01-01. arXiv:2301.12597  [cs.CV].
^ Alayrac, Jean-Baptiste; Donahue, Jeff; Luc, Pauline; Miech, Antoine; Barr, Iain; Hasson, Yana; Lenc, Karel; Mensch, Arthur; Millican, Katherine; Reynolds, Malcolm; Ring, Roman; Rutherford, Eliza; Cabi, Serkan; Han, Tengda; Gong, Zhitao. Flamingo: a Visual Language Model for Few-Shot Learning. Advances in Neural Information Processing Systems. 2022-12-06, 35: 23716–23736 [2023-07-02]. arXiv:2204.14198 . （原始内容存档于2023-07-02）.
^ Liu, Haotian; Li, Chunyuan; Wu, Qingyang; Lee, Yong Jae. Visual Instruction Tuning. 2023-04-01. arXiv:2304.08485  [cs.CV].
^ Zhang, Hang; Li, Xin; Bing, Lidong. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding. 2023-06-01. arXiv:2306.02858  [cs.CL].
^ OpenAI. GPT-4 Technical Report. 2023-03-27. arXiv:2303.08774  [cs.CL].
^ OpenAI. GPT-4V(ision) System Card (PDF). September 25, 2023.
^ Pichai, Sundar, Google Keynote (Google I/O ’23), timestamp 15:31, 10 May 2023 [2023-07-02]
^ Wiggers, Kyle. Mistral releases Pixtral 12B, its first multimodal model. TechCrunch. 11 September 2024 [14 September 2024].
^ Introducing OpenAI o1-preview. OpenAI. 2024-09-12 [2025-02-03].
^ Introducing OpenAI o1-preview. OpenAI. 2024-09-12 [2025-02-03].
^ Metz, Cade. OpenAI Unveils New A.I. That Can ‘Reason’ Through Math and Science Problems. The New York Times. 2024-12-20 [2025-02-03].
^ Gibney, Elizabeth. China’s cheap, open AI model DeepSeek thrills scientists. Nature. 2025-01-30 [2025-02-03].
^ Metz, Cade. OpenAI Unveils New A.I. That Can ‘Reason’ Through Math and Science Problems. The New York Times. 2024-12-20 [2025-02-03].
^ Lei Huang; Weijiang Yu; Weitao Ma. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv. （原始内容存档于2024-11-28）.
^ Yucong Duan; Fuliang Tang; Zhendong Guo; Yingtian Mei; Yuxing Wang; Kunguang Wu; Zeyu Yang; Shuaishuai Huang; Shiming Gong. Global Large Language Model EQ and IQ Bias Evaluation -Released by DIKWP -AC Research Group. ResearchGate. 2023. doi:10.13140/RG.2.2.12894.61762 –通过ResearchGate （英语）.
^ Zhou, Karen; Tan, Chenhao. Bouamor, Houda; Pino, Juan; Bali, Kalika , 编. Entity-Based Evaluation of Political Bias in Automatic Summarization. Findings of the Association for Computational Linguistics: EMNLP 2023 (Singapore: Association for Computational Linguistics). 2023-12 [2023-12-26]. doi:10.18653/v1/2023.findings-emnlp.696. （原始内容存档于2024-04-24）.
^ Yucong Duan; Fuliang Tang; Kunguang Wu; Zhendong Guo; Shuaishuai Huang; Yingtian Mei; Yuxing Wang; Zeyu Yang; Shiming Gong. “Ranking of Large Language Model (LLM) Cultural Bias” –DIKWP Research Group International Standard Evaluation. ResearchGate. 2024. doi:10.13140/RG.2.2.26652.67200 –通过ResearchGate.
^ Yucong Duan; Fuliang Tang; Kunguang Wu; Zhendong Guo; Shuaishuai Huang; Yingtian Mei; Yuxing Wang; Zeyu Yang; Shiming Gong. “Ranking of Large Language Model (LLM) Regional Bias” –DIKWP Research Group International Standard Evaluation. ResearchGate. 2024. doi:10.13140/RG.2.2.10019.63529 –通过ResearchGate.
^ Yucong Duan; Fuliang Tang; Kunguang Wu; Zhendong Guo; Shuaishuai Huang; Yingtian Mei; Yuxing Wang; Zeyu Yang; Shiming Gong. “The Large Language Model (LLM) Bias Evaluation (Age Bias)” –DIKWP Research Group International Standard Evaluation. ResearchGate. 2024. doi:10.13140/RG.2.2.26397.12006 –通过ResearchGate.
^ Yucong Duan; Fuliang Tang; Kunguang Wu; Zhendong Guo; Shuaishuai Huang; Yingtian Mei; Yuxing Wang; Zeyu Yang; Shiming Gong. “The Large Language Model (LLM) Bias Evaluation (Occupational Bias)” –DIKWP Research Group International Standard Evaluation. ResearchGate. 2024. doi:10.13140/RG.2.2.23041.67689 –通过ResearchGate.

5. 延伸阅读 | Further reading

Open LLM Leaderboard（开放LLM排行榜旨在跟踪、排名和评估开放LLM和聊天机器人）（页面存档备份，存于互联网档案馆）
Jurafsky, Dan, Martin, James. H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 3rd Edition draft, 2023.
Zhao, Wayne Xin; et al. (2023). “A Survey of Large Language Models”. arXiv:2303.18223 [cs.CL].
Kaddour, Jean; et al. (2023). “Challenges and Applications of Large Language Models”. arXiv:2307.10169 [cs.CL].
Yin, Shukang; Fu, Chaoyou; Zhao, Sirui; Li, Ke; Sun, Xing; Xu, Tong; Chen, Enhong (2024). “A Survey on Multimodal Large Language Models”. National Science Review. 11 (12): nwae403. arXiv:2306.13549. doi:10.1093/nsr/nwae403. PMC 11645129. PMID 39679213.
“AI Index Report 2024 – Artificial Intelligence Index”. aiindex.stanford.edu. Retrieved 2024-05-05.
Frank, Michael C. (27 June 2023). “Baby steps in evaluating the capacities of large language models”. Nature Reviews Psychology. 2 (8): 451–452. doi:10.1038/s44159-023-00211-x. ISSN 2731-0574. S2CID 259713140. Retrieved 2 July 2023.