Programming

Building in public: realtime sentiment analysis, pt.1

Emiliano García-López

Emiliano García-López

· 3 min read
An image of a stock trading terminal

Building in Public: An Introduction

Building in public refers to the business practice where an entrepreneur creates a product in front of an audience.

When you build in public, you document everything about the building process: where you've succeeded and where you've been a setback. You show your minimum viable product with all its bugs and slowly iterate until you approach a product fit for a mass-market release.

Entrepreneurs choose to do this because knowing the difficulties of making a product instills a sense of appreciation in consumers; furthermore, it allows the entrepreneur to open a channel of communication with his audience, giving him the necessary feedback to refine his product.

What I'm Building in Public

Sentiment analysis is all the rage; for example, if you search "sentiment analysis python," you'll see hundreds of tutorials, many of them relating to stock trading. However, none of these models are precise enough that they could reasonably be used to make money.

One shortcoming is that none can accurately capture the importance of the decoded sentiment. For example, you could feed the model a news article about how AAPL will face anticipated supply chain issues during Q4. The model will say something like "78% Bad," — eliminating important information.

Of course, it'll be hard to build a service like this, and I might not finish in time, so I'll make it in stages where each stage could be a standalone product.

News Aggregator API

Currently, there are no unified news aggregator APIs that provide entire articles. Websites like newsapi.org are helpful, but they only return the first few words of the article/

Below I have an example of what NewsAPI returns when queried for articles involving the keyword "apple."

// selected response for following query: https://newsapi.org/v2/everything?q=apple&from=2022-10-01&to=2022-10-01&sortBy=popularity&apiKey=API_KEY
{
-"source": {
"name": "Wired"
},
"author": "Boone Ashworth",
"title": "Razer Teases a New Handheld Gaming Console",
"description": "Plus more Gear news: Intel’s new app syncs PCs and mobile phones, and Apple slows down iPhone production.",
"url": "https://www.wired.com/story/razer-edge-5g-teased/",
"content": "Razer wants to Switch it up. The maximalist manufacturer of gaming hardware announced this week that it is developing a new handheld gaming device with 5G connectivity. The creation is the result of … [+3091 chars]"
},
// I've chosen to only display the first article

As you can see, it cuts off after a few hundred characters. NewsAPI provides a page where they explain how to use the URL they return to scrape the full article yourself, but at that point, you're already doing most of the work — paying the $449/month isn't worth it.

The API I will make will be using the FastAPI python framework and will be entirely-self hosted. If the project gets some traction, I'll make a fully-managed solution.

Article Info Labeler

The second product will be another API that will take an article as input and return a labeled version of that article. So, for example, it would label all of the companies involved, and it would also give an estimation of what effect that has on each company mentioned.

The following image is from an existing Token Classification model, and the one I'm planning to construct would have more finance-specific classifications.

Analysis & Summary

Lastly, the final piece of the puzzle would extract all critical information and return a prognosis to the user. An example response could be the following:

Having a model spit out such a coherent response with a high level of certainty is difficult. Still, something like this would be a valuable tool for traders needing to consume large amounts of information.

{
  "stock": "Apple",
  "timeframe": "10-01-2022, 10-25-2022",
  "effect": {
    "direction": "negative",
    "magnitude": "large"
  },
  "keywords": [
    "supply chain",
    "china",
    "import tariffs",
    "shipping"
  ]
}
Emiliano García-López

About Emiliano García-López

Hi! I'm Emiliano García-López, a CMU student, web developer, and co-founder of Paisley Microsystems.

Copyright © 2024 Emiliano García-López. All rights reserved.