The best open source software of 2023

InfoWorld’s 2023 Bossie Awards recognize the year’s leading open source tools for software development, data management, analytics, AI, and machine learning.

1 2 Page 2
Page 2 of 2

Spark NLP

Spark NLP is a natural language processing library that runs on Apache Spark with Python, Scala, and Java support. The library helps developers and data scientists experiment with large language models including transformer models from Google, Meta, OpenAI, and others. Spark NLP’s model hub has more than 20 thousand models and pipelines to download for language translation, named entity recognition, text classification, question answering, sentiment analysis, and other use cases. In 2023, Spark NLP released many LLM integrations, a new image-to-text annotator designed for captioning images, support for all major public cloud storage systems, and ONNX (Open Neural Network Exchange) support.

— Isaac Sacolick

StarRocks

Analytics has changed. Companies today often serve complex data to millions of concurrent users in real time. Even petabyte queries must be served in seconds. StarRocks is a query engine that combines native code (C++), an efficient cost-based optimizer, vector processing using the SIMD instruction set, caching, and materialized views to efficiently handle joins at scale. StarRocks even provides near-native performance when directly querying from data lakes and data lakehouses including Apache Hudi and Apache Iceberg. Whether you’re pursuing real-time analytics, serving customer-facing analytics, or just wanting to query your data lake without moving data around, StarRocks deserves a look.

— Ian Pointer

TensorFlow.js

TensorFlow.js packs the power of Google’s TensorFlow machine learning framework into a JavaScript package, bringing extraordinary capabilities to JavaScript developers with a minimal learning curve. You can run TensorFlow.js in the browser, on a pure JavaScript stack with WebGL acceleration, or against the tfjs-node library on the server. The Node library gives you the same JavaScript API but runs atop the C binary for maximum speed and CPU/GPU usage.

If you are a JS developer interested in machine learning, TensorFlow.js is an obvious place to go. It’s a welcome contribution to the JS ecosystem that brings AI into easier reach of a broad community of developers.

— Matthew Tyson

vLLM

The rush to deploy large language models in production has resulted in a surge of frameworks focused on making inference as fast as possible. vLLM is one of the most promising, coming complete with Hugging Face model support, an OpenAI-compatible API, and PagedAttention, an algorithm that achieves up to 20x the throughput of Hugging Face’s transformers library. It’s one of the clear choices for serving LLMs in production today, and new features like FlashAttention 2 support are being added quickly.

— Ian Pointer

Weaviate

The generative AI boom has sparked the need for a new breed of database that can support massive amounts of complex, unstructured data. Enter the vector database. Weaviate offers developers loads of flexibility when it comes to deployment model, ecosystem integration, and data privacy. Weaviate combines keyword search with vector search for fast, scalable discovery of multimodal data (think text, images, audio, video). It also has out-of-the-box modules for retrieval-augmented generation (RAG), which provides chatbots and other generative AI apps with domain-specific data to make them more useful. 

— Andrew C. Oliver

Zig

Of all the open-source projects going today, Zig may be the most momentous. Zig is an effort to create a general-purpose programming language with program-level memory controls that outperforms C, while offering a more powerful and less error-prone syntax. The goal is nothing less than supplanting C as the baseline language of the programming ecosystem. Because C is ubiquitous (i.e., the most common component in systems and devices everywhere), success for Zig could mean widespread improvements in performance and stability. That’s something we should all hope for. Plus, Zig is a good, old-fashioned grass-roots project with a huge ambition and an open-source ethos. 

— Matthew Tyson

Copyright © 2023 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2