The Future is Now

Tag: Microsoft

Copilots for everyone

Microsoft Build is currently underway, with Microsoft showcasing a range of new and upcoming products, including various Copilots such as Copilot for Bing, GitHub, and Edge. In their pipeline, they also have plans to launch a Copilot specifically designed for Windows.

These Copilots are all built using Microsoft’s new Azure AI Studio Platform, which is now open to developers, allowing them to create their own Copilots.

Furthermore, Microsoft announced their support for an open plugin system, similar to the one utilized by ChatGPT, making plugins accessible to all Copilots. If this solution becomes the industry standard for AI systems, it has the potential to establish Microsoft as a dominant player in the AI market. The first day of Microsoft Build concluded with an exceptional presentation by Andrej Karpathy, delving into the history and inner workings of GPT models. If you’re interested in gaining insights into how these models operate and learn, I highly recommend watching his talk titled “State of GPT.”

Microsoft Improves Bing Chat Again

Microsoft announced, that not only Bing Chat is now available for everyone but also that Bing Chat will get new features such as image search, and more ways to present visual information. They also add the ability to summarise PDFs and other types of content.

But the biggest news is that they bring plugins to Bing Chat, which will work similarly to the ChatGPT plugins. I recommend reading the entire announcement yourself. This is the first step to their promise of a copilot for the web and I think they are doing a good job. This also puts pressure on their partner OpenAI which work on their own improvements to ChatGPT and now have to fight against their Investor Microsoft.

Text-to-Speech is reaching a critical point

Today, Microsoft published a paper called “NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers“. In this paper, they show a new text-to-speech model which is not only able to copy human speech, but also singing. The model uses a latent diffusion model and neural audio codec to synthesize high-quality, expressive voices with strong zero-shot ability by generating quantized latent vectors conditioned on text input.

With this model, we are reaching a critical point. text-to-speech is now good enough to fool people and replace many jobs and positions that require speech. It also allows for better speech interfaces to language models which makes the interaction more natural from now on. As we are approaching a future where people have personal Ai assistants, natural speech is a core technology. And even though NaturalSpeech 2 is not perfect it is good enough to start this future.

The New Wave of GPT Agents

Since GPT-3.5 and GPT-4 APIs are available many companies and start-ups have implemented them into their products. Now developers have started to do it the other way around. They build systems around GPT-4 to enable it to search, use APIs, execute code, and interact with itself. Examples are HuggingGPT or AutoGPT. They are based on works like Toolformer or this result. Even Microsoft itself started to build LLM-Augmenter around GPT-4 to improve its performance.

I talked about this development in my post on how to get from GPT-4 to proto-AGI. I still think that this is the way to a general assistant even though I am not sure if GPT-4 is already capable enough or if we need another small improvement.

Microsoft presents its copilot for Office

Today Microsoft showed off how they integrated AI tools, including GPT-4, into their office products. You can ask Copilot to build excel tables, PowerPoints, and Emails or ask it about meetings, or lets it summarise documents and chats.

Copilot in Office

Although currently only available to a select few companies, Copilot is set to become widely available over the next few months. This integration of AI technology has the potential to significantly increase productivity for office workers and could have far-reaching implications for the economy as a whole.

MathPrompter: Mathematical Reasoning using Large Language Models

Microsoft published a new paper in which they present the language model MathPrompter which uses the Zero-shot chain-of-thought prompting technique to generate multiple Algebraic expressions or Python functions to solve the same math problem in different ways and thereby raise the confidence level in the output results. This led to a score of 92.5 on the MultiArith dataset which is beating current sota results by far.

LLMs that use APIs like Toolformer or run their own generated code are a recent development that gives promising results and enables many new capabilities.

GPT-4 Next Week

In a small german information event today, four Microsoft employees talked about the potential of LLMs and mentioned that they are going to release GPT-4 next week. They implied that GPT-4 will be able to work with video data, which implies a multimodal model comparable to PaLM-E. Read more here.

© 2024 Maximilian Kannen

Theme by Anders NorenUp ↑