Observable

Misc

Notebook

  • A collaborative, online notebook platform that comes with libraries loaded to make it fairly straightforward to dive into ad hoc data analysis or produce complete reports.
  • In Observable, if you’re running a JavaScript cell that contains more than just a simple variable assignment (like myVariable = 'Hello World' ), you need to run a code block (i.e. bracket lines of code in curly braces, {}).
  • You can open your notebook in Safe Mode and edit your work without running it.
    • Good for debugging (e.g. infinite while-loops)
  • If you change code in one cell, it automatically updates other cells that depend on it. So, no need to rerun cells that depend on the cell with the changed code
  • Example: EDA Workflow
    1. From Account’s Home Page \(\rightarrow\) Click New Notebook (top-right)
    2. Select a Workspace (Left Panel) \(\rightarrow\) (Optional) Select a Template (Middle) \(\rightarrow\) Click Create Notebook (Bottom Right)
    3. If no template selected, then a blank notebook with a markdown cell opens
    4. Add Data
      • Click Files (Right Panel, Top)
      • Choose File Attachments or Cloud
        • File Attachments: Click or drag a file and drop into attachments area
    5. Create a Table
      • To right of file name in the Attachments window there’s an icon. Click it and a table is created
    6. Create a Scatterplot
      • Click the below the current cell to add a cell
      • In pop-up window, type scatter in Filter window and click Scatterplot chart
    7. In Scatterplot code cell,
      • Replace toy data name (e.g. cars) in code with the name of the cell that has your data
      • Replace names in the x and y attributes with the names of the columns from your dataset

Framework

  • Static Site Generator (SSG) for data apps (e.g. dashboards)

  • This is essentially a {flexdashboard}/Quarto dashboard where everything is in html files which are served to the user. I think the big difference is the native reactivity you get with JS. You can use {flexdashboard}/Quarto with {shiny} but that requires a shiny server instead of a stand alone html file.

    • Additionally it can be easily deployed to ObservableHQ which has built-in activity monitoring.
      • In the workspace next to the app name, click on the to see the user stats
      • No users names for public applications. For private applications, you will see user names.
    • Can also be deployed in other places
  • Misc

    • Data Loaders - Generate data from code which then become part of the site.

      • This can be a shell (.sh), R, Python, or Rust script, etc. that downloads a data file and gets converted into a parquet file.
      • Preprocessing code can also be included.
    • The extra space on the right-side of apps is for a TOC. When using h2 or greater tags (e.g. ##, ###), those sections will show up in a TOC in a panel

      • To remove this space:
        • Open observablehq.config >> Scroll to the bottom where you see // toc: true
        • Remove “//” and replace true with false (e.g. toc: false)
  • Advantages of SSG

    • Speed - App doesn’t have to be continually rebuilt. Database doesn’t have to be repeated queried (See below)
    • Cost - Database is not continuously queried since it’s sent to the user in the form of a parquet file.
    • Hosting - HTML files can be hosted from anywhere (even an S3 bucket).
    • Security - With Dynamic website serving, code is constantly being ran to build the dashboard and query databases. These process are common attack vectors. With SSG, only files get served.
    • Robustness - SSG has fewer moving parts which means fewer things can break
  • Traditional BI app architecture (Dynamic Websites)

    1. Webserver (top-mid) in the cloud gets code to build the dashboard from disk (bottom-left) or from a database (top-left)
    2. Webserver gets the data from a database (top-right)
    3. Webserver serves dashboard to user’s browser (bottom-center).
    4. User performs an interaction which gets sent to the webserver and those process begins anew.
  • SSG approach

    1. (Left) Webserver (top-mid) in the cloud gets data and code to build dashboard from a database (top-left) and disk (bottom-left) and packages both components (bottom-mid)
    2. (Right) Webserver gets dashboard from package and serves it to the user’s browser
    3. (Right) User performs an interaction which gets sent to the webserver. Webserver generates the new dashboard from the package and serves it to the user’s browser.
    4. (Right) Part of the package that gets sent to user can be a database (e.g. duckdb, sqlite) which can get queried directily in the browser.
  • Example: Basic App and Workflow in VS Code

    1. Initiate and create the barebones from terminal

      npx @observeablehq/framework@latest create
    2. Then you go through a series of prompts

      • Where to create project? For current directory: ./my_project
      • Project title?
      • Include sample files? (i.e. examples to get you started)
      • Install dependencies? Choices: npm, yarn, or no
      • Initialize git repository?
        • Observable doesn’t store the code — just the build, so it’s a good idea to back-up the code somewhere.
    3. Change to project directory: e.g. cd my_project

      • In this example, the project is called “penguins” and the requirements.txt, penguins.csv, and predictions.csv.py are files not included in the initiation of the project
        • penguins.csv is the static data file to be used inside dataloader file
          • It can be dragged (in VS Code) from outside the project directory to inside the src directory
        • requirements.txt is the file with the python libraries that’ll be used to transform the data (can also be a model that outputs predictions).
          • It can be dragged (in VS Code) from outside the project directory to inside the project’s root directory
        • predictions.csv.py is the dataloader file
          • The .csv in the files name indicates the file type of the output
          • This can be data transformation code or a model with predictions or just about anything
          • In this example, it’s a logistic regression model outputting predictions
      • package.json is for defining all the js (?) packages you want to use
        • Any npm module can be added to your application
      • src >> index.md is the dashboard layout file (e.g. chart code, html divs, etc.)
    4. Install dataloader dependencies (e.g. python)

      python3 -m venv env
      source env/bin/activate
      pip install -r requirements.txt
    5. Start a preview server from terminal

      npm run dev
    6. Build app in index.md

      ---
      theme: dashboard
      ---
      
      # Penguins
      
      ``` js
      const penguins = FileAttachment{'./predictions.csv'}.csv({typed: true});
      ```
      
      // charts in horizontal div
      <div class="grid grid-cols-4">
        <div class="card grid-colspan-3">${resize((width) => scatterplot(penguins, {width}))}</div>
        <div class="card grid-colspan-1">${resize((width) => waffleplot(penguins, {width}))}</div>
      </div>
      
      // table
      <div class="card" style="padding: 0px">
        ${Inputs.table(penguins)}
      </div>
      
      ``` js
      function scatterplot(data, {width} = {}) {
        return Plot.plot({
          grid: true,
          nice: true,
          color: { legend: true },
          width,
          marks: {
            Plot.dot(penguins, {
              x: "culmen_length_mm", 
              y: "culmen_depth_mm", 
              stroke: "species"
            }),
            Plot.density(penguins, {
              x: "culmen_length_mm", 
              y: "culmen_depth_mm",
              fill: "species",
              fillOpacity: 0.05
            }),
            // shows the incorrectly predicted points
            Plot.dot(data, {
              filter: (d) => d.species !== d.species_predicted,
              x: "culmen_length_mm", 
              y: "culmen_depth_mm",
              r: 6,
              symbol: "square",
              stroke: "currentColor"
            })
          }
        })
      }
      ```
      
      ``` js
      function waffleplot(data, {width} = {}) {
        return Plot.plot({
          width,
          marks: {
            Plot.waffleY(
              penguins,
              Plot.groupX(
                { y: "count" },
                {
                  x: "species",
                  fill: "species",
                  unit: 5,
                  sort: { x: "y", reverse: true }
                }
              )
            ),
            Plot.ruleY([0])
          }
        })
      }
      ```
      • Running file with chart code for the first time will result in the installation of D3, observable plot, and other libraries necessary to create that chart. It won’t install the whole observable plot library — just the bits necessary to create the chart(s).
      • FileAttachment loads the data and the csv with typed: true guesses the column types. The default is typed: false which assumes all columns are strings.
        • Temporarily adding a js cell with display(predictions.csv) and running the file will display the data in a json format in the preview server if you want examine it.
      • Plots are made into functions so that they can be called inside HTML divs
      • Inputs for the table is something already available in Framework or maybe it’s Plot
      • class=“grid” and class=“card” are classes already available in Framework
        • grid-cols-4 says create 4 columns
        • grid-colspan-3 makes the scatter plot take 3 out of the 4 columns (i.e. 3/4 of total width)
        • grid-colspan-1 makes the waffle plot take 1 out of the 4 columns (i.e. 1/4 of total width)
      • resize within each div make the container react to the size of the window (e.g. mobile).
        • Passing the width parameter into the plot functions allows the plot functions to adjust to the size of the plot to the size of the container (i.e. plot fills container and adjusts to size of window)
        • Make sure to include width in each plot’s arguments and within the function.
    7. Deploy to ObservableHQ

      npm run observable deploy
      • Then you go through a series of prompts
        • Must be logged in, so it’ll ask if you want to log in. (for the first deployment)
          • It gives you a URL and a confirmation code.
          • Go to the URL and make sure the confirmation code matches, click continue, the click authorize your device.
        • Choose the observable workspace to deploy to.
        • Choose a current project or to create a new project.
        • What “slug” do you want to use? (I think this just means what name do you want the project to have.)
        • Public or Private?
        • Build again before deployment?
          • Probably good practice to do so.
        • What changed in this deploy? (Same thing as a commit message)
      • It gives you a link for the app once it’s deployed.