Home » today » Business » NVIDIA Releases Easy-to-Install AI Chat ‘Chat with RTX’: Trial Report and System Requirements

NVIDIA Releases Easy-to-Install AI Chat ‘Chat with RTX’: Trial Report and System Requirements

On February 14, 2024, NVIDIA suddenly released the AI ​​chat “Chat with RTX.” Up until now, running a large-scale language model (LLM) locally has taken some effort, but Chat with RTX uses an easy-to-use installer and GUI, so you can easily try it out. I immediately ran it on my Windows+GeForce RTX 3090 (GPU Box connection), so I would like to provide a trial report.

What is NVIDIA Chat with RTX?

I have written several articles on how to run AI chat on a local machine, but most of them are based ontext-generation-webuiwas used to load +Model.

The one released by NVIDIA this time has a similar feel to this, but the front end is NVIDIA’s Web UI, the back end comes with two models as standard, distributed as a set, and can be started and used with a single setup.exe. , creating a hassle-free chat environment. The system requirements to be aware of are as follows.

Chat with RTX Requirements Platform Windows GPU GeForce RTX 30 or 40 series, 8 GB or more of memory NVIDIA RTX Ampere or Ada generation GPU 16 GB or more of memory OS Windows 11 Driver 535.11 or later

16GB or more of memory with Windows 11 is now common for small PCs. It also works on a GeForce RTX 3060 (8GB), so I don’t think it’s too difficult for a local chat.

In addition to regular LLM-based chat, the feature is that text files on your PC can be used as datasets. Various formats are supported, including text (.txt), PDF (.pdf), Word documents (.doc/.docx), and XML (.xml). If you specify a folder, all of the above files will be included in the dataset.

Second, if you specify a YouTube URL as a data set (including playlists), you can also get answers regarding its content.

Chat for RTX looks quite interesting. I want to install it right away.

Easy to install but huge file size!

Installation is super easy! There’s no need to search for anything or copy anything.

1. hereof[今すぐダウンロード]Download the zip file with
2. Extract the zip file
3. Run setup.exe inside

Just this.Automatically after installation

You might think, “Wow! This is easy. Let’s try it out,” but since it comes in one package, the file size is huge.

The ZIP file alone is approximately 35GB The folder in which the ZIP is expanded is approximately 38.3GB

First, the zip file to download is approximately 35GB. When the zip is expanded, it is approximately 38.3GB. The installation destination when setup.exe was executed was approximately 61.7GB. If you continue working without deleting the zip file after unzipping it, a total of approximately 135GB of space will be required.

Other than this point, it goes in quickly and is easy, but with 256GB or 512GB of storage, it might be quite difficult.

I’ve tried

The Web UI interface is clean and organized as shown above. Select “AI model” in the upper left. There are two choices here: “Mistral 7B int4” and “Llama 2 13B int4”. It says int4, so it’s probably a 4-bit quantized version. I’m glad that Llama2, which I like, is the default.

Select AI model on the top left. Two choices: Mistral 7B int4 and Llama 2 13B int4 Power button and data set on the top left

On the right side is the power button and data set selection. You can select AI Model default / Folder Path / YuTube URL. In the center is a chat space.put something underneath[Send]You will receive a reply. The basic language is English, but I was also able to ask questions in Japanese.

The bottom three buttons are Regenerate / Undo / Clear. There are no system settings items anywhere. Personally, I would like to be able to use host_name=”0.0.0.0″ and change the port so that it can be used from other PCs on the LAN (even if it is used on the company’s LAN, not everyone is using a PC equipped with GeForce RTX) (That’s the story).

First, the same question with Mistral 7B int4 and Llama 2 13B int4. When I asked, “Please give me a series of code samples using PHO’s pdo,” the answers were similar (the code itself is correct), but the former was in Japanese. At first, even if I listened to it in Japanese, it was in English, but has something changed? However, although it doesn’t appear on the screen, sometimes the characters are garbled.

I asked how to use Mistral 7B int4/php and pdo in Japanese. Initially, the reply was in English, but now it is in Japanese. The code itself is correct GPU VRAM usage is about 7.8GBLlama 2 13B int4 / Same question. The answer is English GPU VRAM usage is approximately 11.0GB. Since it is 13B, it will be used more than 7B.

VRAM usage is approximately 7.8GB for Mistral 7B int4 and approximately 11GB for Llama 2 13B int4. If it is the former, it will be possible to operate on an 8GB class GPU.

The speed is probably due to optimizations such as TensorRT-LLM, which is much faster than text-generation-webui that I have used in the same environment, and it appears in an instant.

I’ve tried other questions asked in previous articles, and the answers are generally the same and fairly accurate. However, as always, when asked, “What is the second highest mountain in Japan?”, the answer is “Mt. Fuji” (lol).

To make a local text file into a dataset, select the folder path in Dataset on the right, and specify the folder that contains the text files you want to train. I may be a bit reckless this time, but I specified a folder containing a set of book series written by the author in 2023 = Japanese.

I selected Folder Path in the dataset and specified the folder containing the complete series of books written by the author in 2023 = Japanese. When I asked about the specifications of MINISFORUM XTX 780, they were mostly correct, including OCuLink, but the reference file was 03MINISFORUM_UM773_lite.txt.

When I asked about the specifications of the MINISFORUM XTX 780 with the data set installed, the correct answer was the processor, OCuLink, 2.5GbE x 2, tiger print, 32GB/1TB, etc., but for some reason the price was “300,000 yen”. ing. I can’t find such a notation anywhere, so it’s a mystery where it came from. Also, the file name used as a reference is a different manuscript, which is also a mystery.

The accuracy would probably be better in English, but I don’t have a complete set of English files, and this is the result of creating a dataset from a manuscript written in Japanese. , there will also be value in running LLM locally.

To ask about the content of YouTube, set the URL to YouTube URL in the dataset on the right, and[↓]When you click the button, a little processing takes place and it becomes Ready.

This time,Video talking about ComfyUI in EnglishWhen I set up “ComfyUI features and how to install it”, I received a similar answer. Since it was in English, I translated it into Japanese on the browser.

When I set up a video talking about ComfyUI in English and asked about “Features of ComfyUI and how to install it,” I received a similar answer (translated into Japanese on the browser)

This is a very interesting feature and can be used in many ways. In this way, in addition to regular LLM, the ability to create datasets from local text and YouTube videos is a technique that can only be done locally (unless the entire set is uploaded to the cloud), especially the former.

My concern is that it doesn’t support “context understanding”. Understanding context refers to memorizing the content of a chat so that if you ask about it later, you will receive an accurate answer.for example

Person: I like apples and bananas. you?
AI: Gives random answers (often parrot patterns)
Person: What fruit do I like?

When you have a conversation with someone, if they have context understanding, they’ll say apples and bananas, but if they don’t, they’ll say something unrelated. And Chat for RTX is the latter. It’s not supported. In other words, the question and answer can be completed only once. There is no continuation on the next turn.

AI’s turns are messed up = context cannot be understood

In addition, this was also the case with other LLMs at the beginning, but since it has been supported a long time ago, I feel that there is no such thing as not supporting context understanding these days. However, this is not a problem on the LLM side; it is simply a problem on the system side, as the logic does not include past conversations.

If the conversation gets long, the amount will be huge (it will also affect the maximum number of tokens), so it is necessary to delete some of the oldest ones, but I would like to see this addressed in an update.

An interesting announcement occurred just as I was writing this manuscript.

Lately, I’ve been using Gemini more often than ChatGPT, and it looks like Chat with RTX will soon be compatible with its open model, “Gemma”. I can say I’m really looking forward to that day.

This concludes our introduction to NVIDIA’s Chat with RTX. Although there are problems with the lack of context understanding and the large file size, the ability to run this much with a single run of setup.exe is probably important for the general public. I’d especially like to see an update on the former. If you have the environment, please give it a try!

2024-02-26 21:16:00
#Irregular #column #Kazuhisa #NishikawaNVIDIA #releases #chat #Chat #RTX #easily #operated #locally #ability

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.