Professional Documents
Culture Documents
OpenVINO DevCon - Generative AI Fundamentals With OpenVINO™
OpenVINO DevCon - Generative AI Fundamentals With OpenVINO™
Smarter Inference:
Generative AI in 5 lines of code
Our Speakers
Agents
Autonomous Agents
Chatbots
Classification
Code Understanding
Code Writing
Evaluation
Extraction
Multi-Modal
QA Over Documents
Self-Checking
SQL
Summarization
Tagging
Chatbots
AI
AI Service
Trained
AI Service Server Module
Model
Client
Module Network
Machine Performing AI
Request Data Service (Cloud Server)
Machine Requesting AI
Service (Client Device)
Cloud Edge AI PC
Optimum-Intel
(base on Transformers and Diffusers)
Stateful
Compression Quantization
transformation
model_id = "helenai/gpt2-ov"
- model = AutoModelForCausalLM.from_pretrained(model_id)
+ model = OVModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
Accuracy-Control
Quantization
Quantization-
Aware Training
Weight
Compression
Activation-Aware
Weight
Quantization
Filter pruning,
Binarization,
Sparsity,
...
Accuracy-Control
Quantization
Quantization-
Aware Training
Weight
Compression
Activation-Aware
Weight
Quantization
Filter pruning,
Binarization,
Sparsity,
...
Activation Weights
Activation Weights
FP32 INT4/INT8
INT4/INT8/NF4
Quantize Decompression
(on the fly) (on the fly)
Decompression
FP32 (offline)
INT8 INT8
FP32 MatMul
INT8
MatMul
INT32
Dequantize
(on the fly)
FP32
ov_model = OVModelForCausalLM.from_pretrained(
model_path, ov_model = OVModelForCausalLM.from_pretrained(
quantization_config=OVWeightQuantizationConfig model_path,
(bits=4, **model_compression_params)) ov_config={DYNAMIC_QUANTIZATION_GROUP_SIZE: “32”})
1 <sos>
2 <sos> Ich
model_id = "helenai/gpt2-ov"
- model = AutoModelForCausalLM.from_pretrained(model_id)
+ model = OVModelForCausalLM.from_pretrained(model_id,
ov_config={“KV_CACHE_PRECISION”: “u8”, “DYNAMIC_QUANTIZATION_GROUP_SIZE”:
“32”, “PERFORMANCE_HINT”: “LATENCY”)
tokenizer = AutoTokenizer.from_pretrained(model_id)
Pros
Pros Data independence Pros
Data independence Cost efficiency
Special computing capabilities Large amount of data,
Cost efficiency
for optimal performance and limitless compute on demand
Increased control
Autonomous execution energy consumption
Increased control
▪ NPU Documentation
▪ Built-in GPU
▪ Intel® Core Ultra processor
100+
Demos
“zebra” prompt
images
https://docs.openvino.ai/2023.3/ovms_docs_python_support_reference.html
client = make_grpc_client("localhost:9000")
data = ["dog", "Puppies are nice.", "I enjoy taking long walks along
the beach with my dog."]
inputs = {"inputs": data}
results = client.predict(inputs=inputs, model_name="usem")
Download PDF
DEVCON Workshop Series 2024
When working on cloud/edge/PC,
what do you suggest?
www.openvino.ai
DEVCON Workshop Series 2024
Installation
www.openvino.ai
DEVCON Workshop Series 2024
AI: The New Age
Solving the World’s Toughest
Challenges, Together.
Performance results are based on testing as of dates shown in configurations and may not reflect all
publicly available updates. See backup for configuration details.
Intel is committed to respecting human rights and avoiding complicity in human rights abuses. See
Intel's Global Human Rights Principles. Intel's products and software are intended only to be used in
applications that do not cause or contribute to a violation of an internationally recognized human right.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its
subsidiaries. Other names and brands may be claimed as the property of others.