APAI 2i08 - AI Black Gold: Datasources - (Feature Launch 2 of 3)
Clean, useful data powers the AI Boom. Get onboard.
Want to join the AI Boom and make some money? Are you looking for datasources?
Monetize your content passively. Use AI tools to automatically ingest your content and make it available in a data marketplace.
Welcome to the AI Revolution.
Table of Contents:
AI’s Hunger for Data
Clean Data is Expensive
Part 2 of 3 - Startup Feature Launch
Domain aware Datasource Marketplace
What’s Next?
AI’s Hunger for Data
AI training algorithms require data.
This is cut and dry.
There’s no two ways about it.
Data is king.
Data is queen.
AI companies are hungry for data.
While 2022 saw everybody being all giddy about getting AI to make jokes on demand. Then 2023 came and people started trying to get AI to be more domain aware. The best approach is multi-layered:
Specialized AI Prompt
Context aware programmatic engine
Fine-tuned (vectorstore) data
LLM AI model that is finetuned or domain aware
The internet exploded in 2023 with articles telling you how to write better prompts - this is the least expensive option, and first step. I speak about method 2 and 3 in a previous issue of APAI.
Number 4 is the most expensive and where the market is heading in 2024.
LLM AI Models that are domain aware is where the market is heading in 2024.
My work takes me to the bowels of AI - to datacenters, to where GPUs run screaming hot while tech startups flushed with cash come and contract massive numbers of GPU clusters so they can train domain aware AI models.
Make no mistake: 2024 will see an explosion of tech startups that will no longer use OpenAI, but will build their own moat around proprietary domain aware data sets.
Data is king.
Data is queen.
Clean Data is Expensive
A vertically domain aware AI solution is by far the practical way to approach the future of AI.
I am here to argue that Artificial General Intelligence is not the future. At least not the future of our current generation - whoever is in the workforce right now.
The amount of parallel processing happening right now in the world that is crunching non stop new domain aware AI models is staggering. As someone with an idea of GPU marketplaces and the hunger of tech startups for the existing operational GPUs in the market, I cannot see any possible scenario under which an AGI can come and take our jobs in the foreseeable future.
Prepping data, cleaning data, contextualizing data, covering edge cases - this is astonishingly expensive. I will not venture more in the AGI vs Artificial Domain Intelligence (ADI?… I dunno, I just don’t want to spell it out every time). My point is, data is very expensive and we are years from even a draft version of data collection of our collective knowledge.
Part 2 of 3 - Startup Feature Launch
As mentioned in a previous newsletter, I am heading to Doha (Qatar) at the end of February to showcase NeuralDreams at the WebSummit.
There are 3 product features we are launching as forming the core product offering.
First one we announced yesterday is a contextualized Ad Delivery Network.
Second one we announce today is a Datasource Marketplace.
What is special:
Domain focused datasets
Pre-ingested data, ready for Retrieval Augmented Generation (RAG)
Finetune or Add your own dataset - offer it for sale.
No-Code - yes folks. No code datasource finetuning. Boom !
What is one dataset you wish was available for your application/business?
Domain aware Datasource Marketplace
(This paragraph covers a preview of an upcoming feature for NeuralDreams platform. The feature is being launched at the Doha (Qatar) Websummit in Feb 26/29 2024).
2023 saw a significant number of customers of ours come all excited to try AI solutions, most of them wanting to focus on their own specialized dataset or domain context. The problem is that the data needed to set these boundaries for the AI application are in massive, non centralized, unstructured documents.
One customer we had brought documents in following formats: DOC, DOCX, XLS, XML, PDF, CVS, TXT. Yes, the data ingestion was not straightforward at all. Then, we ingested the data in the vectorstore, and when they found out now we needed to start from the top to fine tune the prompts, then add good/bad examples, etc, they paused project.
In 2024 NeuralDreams is launching a marketplace for pre-processed, vertically focused datasources that are being updated on a regular basis. (The documents ingested are all from public
The initial focus is on large domains that are beneficial to a wide pool of the populations: compliance and regulatory verticals.
Initial Launch (Planned)
USA ITAR & EAR Datasources - OpenAI has opened up the terms and conditions to allow companies to build applications for the defence and military.
DSM-5-TR Datasources (Planned) - The Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, Text Revision (DSM-5-TR)
Various Canadian/US publicly available regulatory & compliance documents.
Various Social Media Content Providers
Example - Finetuned ITAR query/retrieval application built on top of base ITAR datasource available in marketplace.
How do you monetize this feature?
Say you have a newsletter and/or Youtube channel where you do reviews of handcreams, makeup, etc. NeuralDreams will automatically ingest your newsletter/youtube videos, add them to your datasource. This datasource you can licence/make available to the marketplace, for an extra revenue stream.
The datasource is 100% private, secure and cannot be accessed by anybody except via our secured, on-demand interfaces.
Example of Datasource for the APAI Newsletter you are reading now
The datasource app is subscribed to the Substack newsletter - as any other subscriber. When you “Publish” a new issue of the newsletter, the email is automatically processed by the native NeuralDreams Substack component. If you turned on the Ad Network component, it will be automatically processed for Ad delivery as well.
How much does it cost? It’s free.
What should you do?
Be Bold: be an early adopter or leave a comment below:
What’s Next?
Getting Started with NeuralDreams Apps:
Signup for the Datasource Marketplace Beta List - Indicate in your message “Datasource Marketplace beta”
Do not be shy, contact me for details, if you are reading this, either you could use a datasource, or you have data to offer. Either way, send me a message.
Do you want a finetuned dataset for a specific application or a baseline application you can finetune yourself?