The Future is Now

Tag: Google

Episode 34: Gemini ist da!

Words of the Future
Words of the future
Episode 34: Gemini ist da!
Loading
/

Gemini ist endlich raus und Florian und Ich reden über das model und was sonst noch diese Woche passiert ist.

GPT Visualisierung: https://bbycroft.net/llm

Mehr Informationen auf dem Discord Server
https://discord.gg/3YzyeGJHth
oder auf https://mkannen.tech

Gemini is here

Google Deepmind just released their new Gemini models. They come in 3 sizes. Nano will be used on devices like the Pixel phones, and Pro will be used in their products such as Bard, and Ultra is going to be released at the beginning of next year. The models are multimodal and can input, audio, video, text, images, and code.

It outperforms current state-of-the-art models not only in text-based tasks but also in other modalities.

Test the Pro version now in Bard and read more about the model here and here.

Google found a way to improve math skills in LLMs

LLMs are powerful tools, but they often struggle with tasks that require logical and algorithmic reasoning, such as arithmetic. A team of researchers from Google has developed a new technique to teach LLMs how to perform arithmetic operations by using in-context learning and algorithmic prompting. Algorithmic prompting means that the model is given detailed explanations of each step of the algorithm, such as addition or multiplication. The researchers showed that this technique can improve the performance of LLMs on arithmetic problems that are much harder than those seen in the examples. They also demonstrated that LLMs can use algorithmic reasoning to solve complex word problems by interacting with other models that have different skills. This work suggests that LLMs can learn algorithmic reasoning as a skill and apply it to various tasks.

Results from the paper comparing their approach vs. other prompting techniques.

Episode 5: Google IO und neue Models

Words of the Future
Words of the future
Episode 5: Google IO und neue Models
Loading
/

In dieser Episode reden Florian und Ich über die Google IO, PaLM 2, Gamini und viele andere News der letzten Woche.

GPT-2 Neuronen:

https://openaipublic.blob.core.windows.net/neuron-explainer/neuron-viewer/index.html

Mehr informationen auf dem Discord server
https://discord.gg/3YzyeGJHth
oder auf https://mkannen.tech/

Google IO Summary

The entire keynote

Google IO happened yesterday and the keynote focused heavily on AI. Some of the things that I found most important are:

PaLM 2 is their new LLM. It comes in different sizes from small enough for pixel phones, to big enough to beat ChatGPT-3.5. It is used in Bard and many of their productivity tools.

Gamini is a multimodal model and the product of the Google DeepMind fusion. It is getting trained right now and could be a contender for the strongest AI when it comes out. I am quite excited about this release since DeepMind is my personal favorite for AGI.

Moreover, they showcased their seamless integration of PaLM and other advanced generative AI tools throughout their product suite as a direct response to Microsoft’s Copilot. They applied the same innovative approach to their search functionality, incorporating PaLM to deliver a search experience reminiscent of Bing GPT. This development fills me with hope, considering their search results outperform those of Bing. It’s likely that their decision to keep PaLM smaller was driven by cost considerations, allowing for more economical operation in the realm of search.

Google and DeepMind Team Up

Google and DeepMind just announced that they will unite Google Brain and Deepmind into Google DeepMind. This is a good step for both sites since Deepmind really needs the computing power of Google to make further progress on AGI and Google needs the Manpower and knowledge of the Deepmind team to quickly catch up to OpenAi and Microsoft. This partnership could lead to a real rival on the way to AGI for OpenAI. I personally always liked that DeepMind had a different approach to AGI and I hope they will continue to push different ideas other than language models.

Zip-NeRF: the next step towards the Metaverse

Neural Radiance Fields (NeRFs), which are used for synthesizing high-quality images of 3D scenes are a class of generative models that learn to represent scenes as continuous volumetric functions, mapping 3D spatial coordinates to RGB colors and volumetric density. Grid-based representations of NeRFs use a discretized grid to approximate this continuous function, which allows for efficient training and rendering. However, these grid-based approaches often suffer from aliasing artifacts, such as jaggies or missing scene content, due to the lack of explicit understanding of scale.

This new paper proposes a novel technique called Zip-NeRF that combines ideas from rendering and signal processing to address the aliasing issue in grid-based NeRFs. This allows for anti-aliasing in grid-based NeRFs, resulting in significantly lower error rates compared to previous techniques. Moreover, Zip-NeRF achieves faster training times, being 22 times faster than current approaches.

This makes them applicable for VR and AR applications and allows for high-quality 3d scenes. Next year when the Hardware improves we will see some very high-quality VR experiences.

Stanford and Google let AI roleplay

In a new research paper, Google and Stanford University created a sandbox world where they let 25 AI agents role-play. The agents are based on chatGPT-3.5 and behave more believably than real humans. Future agents based on GPT-4 will be able to act even more realistically and intelligently. This could not only mean that we get better AI NPCs in computer games, but it also means that we will not be able to distinguish bots from real people. This is a great danger in a world where public opinions influence many. As these agents become more human-like, the risk of deep emotional connections increases, especially if the person does not know that they are interacting with an AI.

Deepmind follows OpenAI

Similar to OpenAI, Deepmind started to work together with other companies to build more commercial products. In their recent blog post they explained how they developed a new Video codec and improved auto chapters for Youtube.

If this trend continues we will see more products for other Alphabet companies developed by Deepmind.

New Biggest Vision Transformer

Google’s new ViT-22B is the largest Vision Transformer model by far, with 22 billion parameters. It has achieved SOTA in numerous benchmarks such as depth estimation, image classification, and semantic segmentation. ViT-22B has been trained on four billion images and can be used for all kinds of computer vision tasks.

This result shows that further scaling in vision transformers can be as valuable as it was for Language Models. This also indicated that future multimodal models can be improved and GPT-4 is not the limit.

Google opens Bard

Google’s GPT alternative Bard is now available in the US and UK. Early testers already speak out in favor of Bing which also launched image generation this week. Bard is based on LaMDA, an older Language model that is not as capable as GPT-4.

New Paper by Google uses Generative AI to train Robots

Google just published the paper “Scaling Robot Learning with Semantically Imagined Experience” showing how to use generated images like Imagen to generate Training data for their robot system. This allows the robot to have a more diverse data set and therefore be more robust and able to solve unseen tasks. We saw similar approaches using simulations for cars, but this is the first time that generative models were used.

Also from google, we got a new paper where they present their advancements in quantum error correction. By scaling to larger numbers of Qubits and combining them to logical Qubits they can reduce the quantum error rate significantly. This opens up a clear path to better quantum computers by just scaling them up.

© 2024 Maximilian Kannen

Theme by Anders NorenUp ↑