General

Misc

  • What chatGPT is:
    • “What would a response to this question sound like” machine Researchers build (train) large language models like GPT-3 and GPT-4 by using a process called “unsupervised learning,” which means the data they use to train the model isn’t specially annotated or labeled. During this process, the model is fed a large body of text (millions of books, websites, articles, poems, transcripts, and other sources) and repeatedly tries to predict the next word in every sequence of words. If the model’s prediction is close to the actual next word, the neural network updates its parameters to reinforce the patterns that led to that prediction.

      Conversely, if the prediction is incorrect, the model adjusts its parameters to improve its performance and tries again. This process of trial and error, though a technique called “backpropagation,” allows the model to learn from its mistakes and gradually improve its predictions during the training process. As a result, GPT learns statistical associations between words and related concepts in the data set.

      In the current wave of GPT models, this core training (now often called “pre-training”) happens only once. After that, people can use the trained neural network in “inference mode,” which lets users feed an input into the trained network and get a result. During inference, the input sequence for the GPT model is always provided by a human, and it’s called a “prompt.” The prompt determines the model’s output, and altering the prompt even slightly can dramatically change what the model produces.Iterative prompting is limited by the size of the model’s “context window” since each prompt is appended onto the previous prompt. ChatGPT is different from vanilla GPT-3 because it has also been trained on transcripts of conversations written by humans. “We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant,”

      ChatGPT has also been tuned more heavily than GPT-3 using a technique called “reinforcement learning from human feedback,” or RLHF, where human raters ranked ChatGPT’s responses in order of preference, then fed that information back into the model. This has allowed the ChatGPT to produce coherent responses with fewer confabulations than the base model. The prevalence of accurate content in the data set, recognition of factual information in the results by humans, or reinforcement learning guidance from humans that emphasizes certain factual responses.

      Two major types of falsehoods that LLMs like ChatGPT might produce. The first comes from inaccurate source material in its training data set, such as common misconceptions (e.g., “eating turkey makes you drowsy”). The second arises from making inferences about specific situations that are absent from its training material (data set); this falls under the aforementioned “hallucination” label.

      Whether the GPT model makes a wild guess or not is based on a property that AI researchers call “temperature,” which is often characterized as a “creativity” setting. If the creativity is set high, the model will guess wildly; if it’s set low, it will spit out data deterministically based on its data set. If creativity is set low, “[It] answers ‘I don’t know’ all the time or only reads what is there in the Search results (also sometimes incorrect). What is missing is the tone of voice: it shouldn’t sound so confident in those situations.”

      In some ways, ChatGPT is a mirror: It gives you back what you feed it. If you feed it falsehoods, it will tend to agree with you and “think” along those lines. That’s why it’s important to start fresh with a new prompt when changing subjects or experiencing unwanted responses.

      “One of the most actively researched approaches for increasing factuality in LLMs is retrieval augmentation—providing external documents to the model to use as sources and supporting context,” said Goodside. With that technique, he explained, researchers hope to teach models to use external search engines like Google, “citing reliable sources in their answers as a human researcher might, and rely less on the unreliable factual knowledge learned during model training.” Bing Chat and Google Bard do this already by roping in searches from the web, and soon, a browser-enabled version of ChatGPT will as well. Additionally, ChatGPT plugins aim to supplement GPT-4’s training data with information it retrieves from external sources, such as the web and purpose-built databases.

      Other things that might help with hallucination include, “a more sophisticated data curation and the linking of the training data with ‘trust’ scores, using a method not unlike PageRank… It would also be possible to fine-tune the model to hedge when it is less confident in the response.” (arstechnica article)

  • OpenAI models
    • davinci (e.g. davinci-003) text-generation models are 10x more expensive than their chat counterparts (e.g. gpt-3.5-turbo)
    • For lower usage in the 1000’s of requests per day range ChatGPT works out cheaper than using open-sourced LLMs deployed to AWS. For millions of requests per day, open-sourced models deployed in AWS work out cheaper. (As of April 24th, 2023.) (article)
      • Used AWS Lambda for deployment
    • davinci hasn’t been trained using reinforcement learning from human feedback (RLHF}
    • chatgpt 3.5 turbo models
      • Pros
        • Performs better on 0 shot classification tasks than Davinci-003
        • Outperforms Davinci-003 on sentiment analysis
        • Significantly better than Davinci-003 at math
        • cheaper than davinci
      • Cons
        • Tends to produce longer responses than Davinci-003, which may not be ideal for all use cases
        • Including k-shot examples can lead to inefficient resource usage in multi-turn use cases
    • davinci-003
      • Pros
        • Performs slightly better than GPT-3.5 Turbo with k-shot examples
        • Produces more concise responses than GPT-3.5 Turbo, which may be preferable for certain use cases
      • Cons
        • Less accurate than GPT-3.5 Turbo on 0 shot classification tasks and sentiment analysis
        • Performs significantly worse than GPT-3.5 Turbo on math tasks
  • Use Cases
    • Understanding code (Can reduce cognative load)(article)
      • During code reviews or onboarding new programmers
      • under-commented code
    • Generating the code scaffold for a problem where you aren’t sure where or how to start solving it.
    • LLMs don’t require removing stopwords during preprocessing of documents
  • Generate “Impossibility” List (source)
    • “I suggest that people and organizations keep an ‘impossibility list’ - things that their experiments have shown that AI can definitely not do today but which it can almost do. . . . When AI models are updated, test them on your impossibility list to see if they can now do these impossible tasks.” - Ethan Mollick, Gradually, then Suddenly: Upon the Threshold”
  • Cost
    • For lower usage in the 1000’s of requests per day range ChatGPT works out cheaper than using open-sourced LLMs deployed to AWS. For millions of requests per day, open-sourced models deployed in AWS work out cheaper. (article, April 24th, 2023.)
  • Methods for giving chatGPT data
    • Think you can upload a file
    • Through prompt
      • See bizsci video
        • paste actual data
        • paste column names and types (glimse() with no values)
      • Generate a string for each row of data that contains the column name and value
        • Example
          • “The <column name> is <cell value>. The <column name> is <cell value>. …”
          • “The fico_score is 578.0. The load_amount is 6000.0. The annual income is 57643.54.”
  • Evolution of LLMs

Hosting

  • Cloud-based online platforms
    • Platforms like OpenAI’s GPT4 store and Huggingface Space allow developers to focus on prompt engineering and interaction designing without configuring hardware, environment, and web framework. However, they have the following limitations:
      • Privacy related to individual or commercial information.
      • Latency due to remote servers and shared GPU resource pools.
      • Cost for remote API calls or on-demand servers.
  • Managed self-hosted applications
    • Self-hosted applications relying on a managed stack or framework like Ollama+OpenWebUI offer ready-to-use templates for running various LLM applications locally. This solution draws attentions because the models like Llama 3 (8B) model can easily run on a PC with a 16G GPU. While the solution is limited by:
      • Complexity in setup and maintenance.
      • Inflexibility due to limited customization.
  • Custom self-hosted applications
    • To overcome the limitations of the managed self-hosted solution, an alternative is to create custom self-hosted applications, which use custom-built components across the stack.